CN116977812A - Image data detection method, device, equipment and medium - Google Patents

Image data detection method, device, equipment and medium Download PDF

Info

Publication number
CN116977812A
CN116977812A CN202211418837.9A CN202211418837A CN116977812A CN 116977812 A CN116977812 A CN 116977812A CN 202211418837 A CN202211418837 A CN 202211418837A CN 116977812 A CN116977812 A CN 116977812A
Authority
CN
China
Prior art keywords
sample
stage
virtual
detection model
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211418837.9A
Other languages
Chinese (zh)
Inventor
孙可
陈燊
姚太平
陈阳
丁守鸿
纪荣嵘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202211418837.9A priority Critical patent/CN116977812A/en
Publication of CN116977812A publication Critical patent/CN116977812A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an image data detection method, an image data detection device and a medium, and the method can be applied to the field of image detection, and comprises the following steps: acquiring an image detection model trained in the t-1 stage, and acquiring a disturbance vector set of the image detection model in the t-1 stage; combining the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain a virtual sample of the t stage, and obtaining a false sample of the t stage; correcting network parameters of the image detection model trained in the t-1 stage according to the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage to obtain a target detection model; the target detection model is used for detecting the authenticity of a target object contained in the source image. By adopting the embodiment of the application, the historical characteristic distribution during the training in the previous stage can be reserved, and the detection accuracy of the image detection model can be further improved.

Description

Image data detection method, device, equipment and medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting image data.
Background
With the progress of machine learning and computer vision technologies, deep fake (deep) technology has also been rapidly developed; depth forging may refer to creating or synthesizing audiovisual content (e.g., images, audio-video, text, etc.) based on intelligent methods such as deep learning. The abuse of the deep fake data brings a great amount of potential safety hazards and privacy hazards, so that the detection task for the deep fake data is also gaining more and more attention.
In the current deep forgery detection scene, features of a video/image can be extracted through a deep learning model to perform detection so as to obtain a detection result of the video/image. However, the deep-forgery data may be increased based on the update of the forgery technology, and the accuracy of the detection result may be too low when the deep-learning model is used to detect the new-forgery data because the aforementioned deep-learning model does not relate to the types of the new-forgery data during training.
Disclosure of Invention
The embodiment of the application provides an image data detection method, an image data detection device, image data detection equipment and an image data detection medium, which can keep the historical characteristic distribution during the training in the previous stage, and further can improve the detection accuracy of an image detection model.
In one aspect, an embodiment of the present application provides an image data detection method, including:
acquiring an image detection model trained in the t-1 stage, and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vector in the disturbance vector set is used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1;
combining the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain a virtual sample of the t stage, and obtaining a false sample of the t stage;
correcting network parameters of the image detection model trained in the t-1 stage according to the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage to obtain a target detection model; the target detection model is an image detection model trained in the t-th stage, and is used for detecting the authenticity of a target object contained in the source image.
An aspect of an embodiment of the present application provides an image data detection apparatus, including:
the set acquisition module is used for acquiring an image detection model trained in the t-1 stage and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vector in the disturbance vector set is used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1;
The sample construction module is used for combining the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain a virtual sample of the t stage and obtain a false sample of the t stage;
the parameter correction module is used for correcting network parameters of the image detection model trained in the t-1 stage according to the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage to obtain a target detection model; the target detection model is an image detection model trained in the t-th stage, and is used for detecting the authenticity of a target object contained in the source image.
Wherein, the collection acquisition module includes:
the disturbance vector initialization unit is used for acquiring an initial disturbance vector of the image detection model in a t-1 stage in the t-1 stages, and combining the initial disturbance vector of the t-1 stage and a real sample of the t-1 stage into an initial virtual sample;
the virtual sample prediction unit is used for inputting the initial virtual sample into the image detection model of the t-1 stage and outputting a first sample prediction value corresponding to the initial virtual sample through the image detection model of the t-1 stage;
The disturbance vector updating unit is used for correcting the initial disturbance vector according to the predicted value of the first sample and the label information corresponding to the real sample of the t-1 stage to obtain a disturbance vector of the t-1 stage, and adding the disturbance vector of the t-1 stage to the disturbance vector set.
The disturbance vector updating unit is specifically configured to:
determining label information corresponding to an initial virtual sample according to label information corresponding to a real sample in the t-1 stage;
determining a first cross entropy loss corresponding to the initial virtual sample according to the first sample predicted value and label information corresponding to the initial virtual sample;
and obtaining a gradient value of the first cross entropy loss, and carrying out iterative updating on the initial disturbance vector based on the initial disturbance vector and the gradient value of the first cross entropy loss to obtain a disturbance vector of the t-1 stage.
Wherein the sample construction module comprises:
the disturbance vector sampling unit is used for sampling disturbance vectors of all stages contained in the disturbance vector set to obtain a target disturbance vector;
the virtual sample construction unit is used for carrying out summation operation on the target disturbance vector and the real sample of the t stage to obtain a virtual sample of the t stage, and setting label information for the virtual sample of the t stage.
Wherein, the parameter correction module includes:
the sample data prediction unit is used for determining a real sample of the t-th stage, a virtual sample of the t-th stage and a false sample of the t-th stage as a sample data set of the t-th stage, and outputting a second sample predicted value corresponding to sample data in the sample data set through an image detection model trained by the t-1-th stage;
the cross entropy loss determining unit is used for determining a second cross entropy loss of the t-th stage according to the total number of samples in the sample data set, a second sample predicted value corresponding to the sample data in the sample data set and label information corresponding to the sample data in the sample data set;
the virtual entropy loss determining unit is used for determining the virtual entropy loss of the t-th stage according to the number of virtual samples in the sample data set, the second sample predicted value corresponding to the virtual samples in the sample data set and the label information corresponding to the virtual samples in the sample data set;
a first error loss determining unit, configured to determine a virtual mean square error loss at the t-th stage according to the number of virtual samples in the sample data set, a sample description feature of the virtual samples in the sample data set at the t-th stage, and a sample description feature of the virtual samples in the sample data set at the t-1 th stage;
A second error loss determining unit, configured to determine a true mean square error loss at the t-th stage according to the sample description feature of the true sample in the sample data set at the t-th stage and the sample description feature of the true sample in the sample data set at the t-1 stage;
the model loss determining unit is used for determining model loss of the t stage according to the second cross entropy loss, the virtual mean square error loss and the real mean square error loss, and correcting network parameters of the image detection model trained in the t-1 stage based on the model loss of the t stage to obtain a target detection model.
The cross entropy loss determining unit is specifically configured to:
carrying out logarithmic processing on the difference value between the target constant and the second sample predicted value corresponding to the sample data in the sample data set to obtain a first logarithmic value;
determining a difference value between the target constant and label information corresponding to sample data in the sample data set as a label difference value, and determining a product between the label difference value and a first logarithmic value as a first product value;
carrying out logarithmic processing on the sum of the first product value and a second sample predicted value corresponding to sample data in the sample data set to obtain a second logarithmic value;
Accumulating the product of the label information corresponding to the sample data in the sample data set and the second logarithmic value to obtain a sample accumulated value corresponding to the sample data set;
a second cross entropy penalty for the t-th stage is determined based on the ratio between the sample accumulation value and the total number of samples in the sample dataset.
The virtual entropy loss determining unit is specifically configured to:
carrying out logarithmic processing on a second sample predicted value corresponding to the virtual sample in the sample data set to obtain a third logarithmic value;
accumulating the product of the label information corresponding to the virtual sample in the sample data set and the third logarithmic value to obtain a virtual accumulated value corresponding to the sample data set;
and obtaining the virtual entropy loss of the t-th stage according to the ratio between the virtual accumulated value and the virtual sample number in the sample data set.
Wherein the first error loss determination unit is specifically configured to:
outputting sample description characteristics of virtual samples in the sample data set in the t-1 stage through the image detection model of the t-1 stage;
outputting sample description characteristics of the virtual samples in the sample data set in the t stage through the image detection model of the t stage;
acquiring sample description characteristics of a virtual sample in a sample data set at a t-th stage and virtual sample errors between the sample description characteristics of the virtual sample in the sample data set at the t-1 th stage;
And determining the virtual mean square error loss of the t stage according to the virtual sample error corresponding to the virtual samples in the sample data set and the number of the virtual samples in the sample data set.
Wherein the second error loss determination unit is specifically configured to:
outputting sample description characteristics of real samples in the sample data set in the t-1 stage through the image detection model in the t-1 stage;
outputting sample description characteristics of real samples in a sample data set in a t stage through an image detection model in the t stage;
acquiring sample description characteristics of real samples in a sample data set at a t-th stage and real sample errors of the real samples in the sample data set between the sample description characteristics of the real samples in the t-1 th stage;
and accumulating the real sample errors corresponding to the real samples in the sample data set to obtain the real mean square error loss of the t stage.
The model loss determining unit determines a model loss of a t-th stage according to the second cross entropy loss, the virtual mean square error loss and the real mean square error loss, and includes:
obtaining a balance coefficient associated with the virtual mean square error loss and the real mean square error loss, and determining the product between the virtual mean square error loss and the real mean square error loss and the balance coefficient as the sample mean square error loss of the t-th stage;
And carrying out summation operation on the second cross entropy loss, the virtual entropy loss and the sample mean square error loss to obtain model loss in the t-th stage.
Wherein the apparatus further comprises:
the feature extraction module is used for acquiring a source image containing a target object, inputting the source image into the target detection model, and outputting object description features corresponding to the source image through a feature extraction component in the target detection model;
the feature recognition module is used for recognizing the object description features through a classifier in the target detection model and outputting predicted probability values corresponding to the object description features;
and the detection result determining module is used for determining a detection result corresponding to the target object contained in the source image according to the prediction probability value.
The detection result determining module is specifically configured to:
if the predicted probability value is greater than or equal to the probability threshold value, determining that a detection result corresponding to the target object contained in the source image is a false object;
if the predicted probability value is smaller than the probability threshold value, determining that the detection result corresponding to the target object contained in the source image is a real object.
An aspect of an embodiment of the present application provides a computer device, including a memory and a processor, where the memory is connected to the processor, and the memory is used to store a computer program, and the processor is used to call the computer program, so that the computer device performs the method provided in the foregoing aspect of the embodiment of the present application.
An aspect of an embodiment of the present application provides a computer readable storage medium, in which a computer program is stored, the computer program being adapted to be loaded and executed by a processor, to cause a computer device having a processor to perform the method provided in the above aspect of an embodiment of the present application.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided in the above aspect.
In the embodiment of the application, an image detection model trained in the t-1 stage can be obtained, and a disturbance vector set corresponding to the image detection model is obtained, wherein disturbance vectors in the disturbance vector set are used for representing the historical characteristic distribution of the image detection model in the t-1 stage, different stages correspond to different false samples, and t is an integer greater than 1; the disturbance vector in the disturbance vector set and the real sample of the t stage can be combined to obtain a virtual sample of the t stage, and the obtained virtual sample can keep the historical characteristic distribution of the image detection model during the training of the previous stage and reduce the storage space of the image detection model during the training process; and correcting network parameters of the image detection model trained in the t-1 stage through the real sample, the false sample and the virtual sample in the t stage, so that a target detection model trained in the t stage can be obtained. In other words, in the training process of the t-th stage, when the new false sample in the t-th stage is trained, the training process is performed together with the virtual sample of the t-th stage, so that the historical characteristic distribution of the previous t-1 stages can be reserved, and the detection accuracy of the image detection model can be improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an image detection model in a training process according to an embodiment of the present application;
fig. 3 is a flowchart of an image data detection method according to an embodiment of the present application;
FIG. 4 is a schematic view of optimizing disturbance vectors at stage t-1 according to an embodiment of the present application;
FIG. 5 is a flowchart of another image data detection method according to an embodiment of the present application;
FIG. 6 is a training schematic diagram of an image detection model according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a detection scenario based on a target detection model according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a data authentication scene of an object detection model according to an embodiment of the present application;
Fig. 9 is a schematic structural diagram of an image data detecting device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
For easy understanding, the following describes the basic technical concept related to the embodiment of the present application:
computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to replace the machine Vision such as human eyes with a camera and a Computer to identify, locate and measure a target, and further perform graphic processing to make the Computer process an image more suitable for human eyes to observe or transmit to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques may generally include image processing, image recognition, image detection, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning and map construction, autopilot, intelligent transportation, etc., as well as common biometric techniques such as face recognition, fingerprint recognition, etc.
The application relates to image detection under the computer vision technology, which is characterized in that false images (which can be called false samples) containing false objects are given to an image detection model for training according to time sequence, for example, the false samples can be obtained according to time sequence, and the image detection model is subjected to stage training based on the false samples so as to obtain a target detection model of which the current stage training is completed; the target detection model after training can be used for detecting whether the target object contained in the image is a real object or not, and the target recognition model after training can improve the detection accuracy of the target object in the image. Wherein the target object may include, but is not limited to: identity documents, specific articles (e.g., famous paintings, antiques, and other collection), human body parts (e.g., faces, and the like), the application does not limit the type of target object; a real object may refer to an object that is actually present in the real world, and a false object may refer to an object synthesized by a forgery technique.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture provided in an embodiment of the present application, where the network architecture may include a server 10d and a terminal cluster, and the terminal cluster may include one or more terminal devices, where the number of terminal devices included in the terminal cluster is not limited. As shown in fig. 1, the terminal cluster may specifically include a terminal device 10a, a terminal device 10b, a terminal device 10c, and the like; all terminal devices in the terminal cluster (which may include, for example, terminal device 10a, terminal device 10b, and terminal device 10c, etc.) may be in network connection with the server 10d, so that each terminal device may interact with the server 10d through the network connection.
The server 10d may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform, and the type of the server is not limited in the present application.
The terminal devices of the terminal cluster may include, but are not limited to: the application relates to electronic equipment with image detection functions, such as smart phones, tablet computers, notebook computers, palm computers, mobile internet equipment (mobile internet device, MID), wearable equipment (such as smart watches, smart bracelets and the like), intelligent voice interaction equipment, intelligent household appliances (such as smart televisions and the like), vehicle-mounted equipment, aircrafts and the like, and the type of the terminal equipment is not limited.
As shown in fig. 1, the terminal devices in the terminal cluster may integrate an application client with an image detection function, where the application client may include, but is not limited to: multimedia clients (e.g., short video clients, live video clients, video clients), entertainment clients (e.g., game clients), social clients (e.g., instant messaging application clients, office clients), financial clients (e.g., banking clients), transportation clients, and the like. Taking the terminal device 10a in the terminal cluster shown in fig. 1 as an example, the terminal device 10a may acquire an image including a false object (may be used as a false sample), and stage-divide the false sample according to a time sequence, where the stage-divided false sample may be used as sample data of different stages to train the image detection model in sequence, that is, the false samples of each stage are different, and training of each stage is completed according to the time sequence.
It will be appreciated that, assuming that the training process of the image detection model may include t stages, where t may be represented as a total number of stages of the image detection model during the training process, t may be an integer greater than 1, e.g., t may take on values of 2,3, … …. The t stages can use the same real sample (image containing real object) and different false samples for model training, i.e. the t stages can share the same real sample; meanwhile, each stage can keep the historical characteristic distribution of the previous stage (the characteristic distribution of the false sample in the previous stage can be understood), and perform combined training with the true sample and the false sample to obtain a final training target detection model, wherein the target detection model is an image detection model trained by t stages.
After the terminal device 10a acquires the target detection model, the target detection model may be issued in an application client integrated with the terminal device 10a, so that the target detection model may be applied in the application client. For example, the terminal device 10a may detect a source image containing a target object in the application client by using the target detection model, to obtain a detection result corresponding to the target object in the source image; the detection result may include a real object and a false object, and if the detection result is the real object, the detection result indicates that the target object included in the source image is detected as the real object; if the detection result is a false object, the target object contained in the source image is detected as the false object; by the target detection model, the detection accuracy of the target object in the source image can be improved.
It should be noted that, the training process of the image detection model and the application process of the trained image detection model (target detection model) may be performed by a computer device, that is, the image data detection method provided in the embodiment of the present application may be performed by a computer device, and the computer device may be a server 10d in the network architecture shown in fig. 1, or any one of terminal devices in the terminal cluster, or may be a computer program (including program code, for example, an application client integrated by the terminal device), which is not limited in the embodiment of the present application.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an image detection model in a training process according to an embodiment of the present application. Taking the terminal device 10a in the network architecture shown in fig. 1 as an example, a training process of the image detection model is described; it will be appreciated that, before training the image detection model, sample data for training the image detection model may be acquired, where the sample data may include a real sample and a false sample, each sample data may carry tag information, such as tag information carried by the real sample may be referred to as a real tag, and tag information carried by the false sample may be referred to as a false tag, where the real tag and the false tag may be represented in the form of numerical values, characters, and the like, which is not limited in this disclosure.
As shown in fig. 2, the terminal device 10a may input the acquired dummy samples to the image detection model stepwise in time series, and perform model training based on the input dummy samples. The training process of the image detection model may include t stages, each stageThe false samples are all different, so in order to be more realistic, the true samples can be shared in t phases, i.e. the training in t phases can all use the same true samples, since in the real world the true samples are easily available. Wherein the initialized image detection model can be recorded as a model f 0 The terminal device 10a can use the real sample 20a and the dummy sample 20b as sample data of the image detection model at the 1 st stage, and utilize the real sample 20a and the dummy sample 20b to model f 0 Training is performed to obtain an image detection model (which can be marked as model f) 1 ) The method comprises the steps of carrying out a first treatment on the surface of the In other words, the training process of stage 1 can be understood as model f 0 Updated to model f 1 Is a process of (2).
Further, the terminal device 10a may use the real sample 20a and the dummy sample 20c as sample data of the image detection model at the 2 nd stage, and use the real sample 20a and the dummy sample 20c for the model f 1 Training is performed to obtain an image detection model (which can be marked as model f 2 ) The method comprises the steps of carrying out a first treatment on the surface of the In other words, the image detection model at the stage 2 is the model f 1 The training process of stage 2 can be understood as model f 1 Updated to model f 2 Is a process of (2). Similarly, the sample data of the t-th stage may include a true sample 20a and a false sample 20d, and the image detection model trained in the t-th stage may be denoted as a model f t The training process of the t-th stage can be understood as the model f t-1 (image detection model trained in t-1 stage) updating to model f t The image detection model trained in the t-th stage can be regarded as a target detection model trained in the t-th stage, and the target detection model can have accurate detection results on false samples from the 1 st stage to the 2 nd stage.
It should be noted that, in order to accelerate the training speed of the image detection model, different false samples are used for training the image detection model in each stage, that is, each stage only needs to input a new false sample, and the false sample in the previous stage is not needed to be input; in training of the current stage (e.g., the t-th stage), the image detection model is not able to touch the false samples of the previous stage (e.g., the t-1 previous stage), so that the historical feature distribution of each stage can be preserved, such as unified noise immunity (universal adversarial perturbations, UAP) can be calculated; through UAP of each stage and real samples of the current stage, virtual samples similar to false samples of the previous stage (or virtual samples similar to historical feature distribution) can be constructed, and the constructed virtual samples, the real samples of the current stage and the false samples are jointly trained to obtain an image detection model which is trained in the current stage, so that the forgetting problem of the image detection model can be relieved, the generalization of the image detection model can be improved, and the detection accuracy of the image detection model can be improved. The real sample and the false sample related to the embodiment of the application are image data acquired in the real world, and the virtual sample is virtual sample data created in the training process of the image detection model.
It can be understood that, if the terminal device 10a continues to acquire a new false sample after training to acquire the target detection model, the terminal device 10a may continue to train the target detection model, for example, may use the new false sample and the real sample 20a as sample data of the t+1st stage, and train the target detection model by using the new false sample and the real sample 20a to acquire the image network model of the t+1st stage. In other words, the image detection model according to the embodiment of the present application may be continuously trained, and after the terminal device 10a acquires a new false sample, the training may be continuously performed on the image detection model that has been trained currently, so as to obtain the latest image detection model.
Referring to fig. 3, fig. 3 is a flowchart of an image data detection method according to an embodiment of the present application; it will be appreciated that the image data detection method may be performed by a computer device, which may be a server, or may be a terminal device, or may be a computer program (including program code), to which the application is not limited. As shown in fig. 3, the video image data detection method may include the following steps S101 to S103:
Step S101, acquiring an image detection model trained in the t-1 stage, and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vectors in the disturbance vector set are used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1.
Specifically, the training process of the image detection model may include t stages, where t may be an integer greater than 1, and since the training process of the image detection model in each stage is the same, for ease of understanding, the training process of the image detection model is described by taking the training of the t stage as an example. After the computer equipment trains to obtain the image detection model of the t-1 stage, if a new false sample is obtained, the image detection model can be determined to enter the t stage. In the t stage, an image detection model trained in the t-1 stage can be obtained, and a disturbance vector set corresponding to the image detection model is obtained; the set of disturbance vectors may include disturbance vectors of the image detection model in t-1 phases, respectively, and the disturbance vectors in the set of disturbance vectors may be used to characterize a historical feature distribution of the image detection model in t-1 phases. For example, the disturbance vector may be a unified noise countermeasure (UAP), by which the real sample may be forged into a false sample.
Alternatively, the image detection model may obtain a disturbance vector in each stage of training, the disturbance vector may construct a real sample into a virtual sample by using the image detection model, a stage may correspond to a disturbance vector, and the disturbance vector of each stage may be added to the disturbance vector set. By the disturbance vector of each stage, the historical feature distribution of each stage (such as the feature distribution corresponding to the false sample of each stage) can be maintained, wherein the calculation mode of the disturbance vector of each stage in the t-1 stages is the same, and for convenience of understanding, the disturbance vector of the t-1 st stage (the latest stage in the previous t-1 stages) is taken as an example.
The embodiment of the application can realize the followingthe image detection model trained in t-1 stages is marked as a model f t-1 The initial disturbance vector of the image detection model in the t-1 stage is obtained, and then the initial disturbance vector of the t-1 stage and the real sample of the t-1 stage can be combined into an initial virtual sample. Inputting the initial virtual sample into an image detection model of the t-1 stage, and outputting a first sample predicted value corresponding to the initial virtual sample through the image detection model of the t-1 stage; and correcting the initial disturbance vector according to the first sample predicted value and the label information corresponding to the real sample of the t-1 stage to obtain a disturbance vector of the t-1 stage, and adding the disturbance vector of the t-1 stage to a disturbance vector set.
Wherein in each stage of training, a perturbation vector needs to be found such that the image detection model creates a real sample as a false sample (the false sample constructed from the perturbation vector may be referred to as a virtual sample) which is found in model f t-1 The predicted value of the virtual sample is different from the tag information of the corresponding real sample, and the relationship between the tag information of the virtual sample and the tag information of the real sample can be shown in the following formula (1):
wherein Pred is used to model f t-1 The output characteristic output is mapped into a first sample predicted value; x is x r Representing the real sample of the t-1 stage, y r True sample x representing stage t-1 r Is a label information of (a); p is p t-1 ∈R C*H*W Representing the disturbance vector of the t-1 stage, i.e. disturbance vector p t-1 The dimension of (2) is C.H.W.C is the number of channels, H is the height, W is the width; θ, a,All are parameters; e is used to control the size of the disturbance vector.
The embodiment of the application can convert the disturbance vector p t-1 As an effective feature of the t-1 stage, and under gradientDrop wise optimization of disturbance vector p t-1 The gradient descent method herein may be a gradient descent method based on cross entropy loss, or may be other optimization methods, such as conjugate gradient method, newton method and quasi-newton method, random gradient method, etc., which is not limited in the present application.
For ease of understanding, the disturbance vector p is optimized in the following manner in a gradient descent manner based on cross entropy loss t-1 In the disturbance vector p t-1 The disturbance vector p can be optimized before t-1 Initializing to obtain the initial disturbance vector; the initialized disturbance vector may be a matrix with elements of 0, a matrix with elements of 1, a unit matrix (a matrix with diagonal elements of 1 and the rest elements of 0), a random matrix (a matrix with random numbers for each element), etc., and the application relates to a disturbance vector p t-1 The initialization mode of (2) is not limited.
Wherein the disturbance vector p t-1 The optimization process of (c) may include, but is not limited to: determining label information corresponding to an initial virtual sample according to label information corresponding to a real sample in the t-1 stage; determining a first cross entropy loss corresponding to the initial virtual sample according to the first sample predicted value and label information corresponding to the initial virtual sample; obtaining a gradient value of the first cross entropy loss, and based on the initial disturbance vector and the gradient value of the first cross entropy loss, carrying out iterative updating on the initial disturbance vector to obtain a disturbance vector of a t-1 stage, wherein the optimization step is shown in the following formula (2):
Wherein L is ce [(x r +p t-1 ),y f ;f t-1 ]Representing a first cross entropy loss corresponding to the initial virtual sample, the first cross entropy loss being associated with the virtual sample of the t-1 st stage (formed by adding the disturbance vector of the t-1 st stage and the real sample), the first cross entropy loss being calculated in a similar manner to the second cross entropy loss represented by the formula (3);y f tag information representing the expected of the initial virtual sample, the expected tag information y f Is the same as the real sample x r Tag information y of (2) r Reverse labels (e.g., label information for false samples);representing the first cross entropy loss with respect to the disturbance vector p t-1 Is a sign function indicating the first cross entropy loss with respect to the disturbance vector p t-1 Alpha is the learning rate, which can be set in a self-defined way.
When the real samples exceeding the proportion threshold exist in the real samples of the t-1 stage to be constructed as virtual samples, the iteration can be stopped, and the disturbance vector p at the moment can be calculated t-1 A set of disturbance vectors (also referred to as a unified noise floor, denoted UAP pool P) is stored. The ratio threshold may be set manually according to actual requirements, for example, the ratio threshold may be set to 50%, or may be set to 70%, or the like, which is not limited in the present application.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating optimization of disturbance vectors at the t-1 stage according to an embodiment of the present application. The image detection model trained in the t-1 stage is shown as model 30f in FIG. 4, and the model 30f can also be denoted as model f t-1 The model 30f may include a feature extraction component g t-1 A full connectivity layer 30d, and a classification layer 30e (which may be a softmax layer, which may also be referred to as a classifier).
The disturbance vector 30c of the t-1 stage may be initialized, and then the initialized disturbance vector 30c and the real sample of the t-1 stage may be combined into an initial virtual sample, and the initial virtual sample may be input to the model 30f. By feature extraction component g in model 30f t-1 Extracting features of the initial virtual sample to obtain sample description features corresponding to the initial virtual sample; further, sample description information corresponding to the initial virtual sample can be converted into a target feature vector through the full connection layer 30d in the model 30 f; the target feature vector may be input to a classification layer30e, identifying the target feature vector by the classification layer 30e, and outputting a first sample predicted value corresponding to the initial virtual sample. The sample description features corresponding to the initial virtual sample may be used to characterize characteristic information of the target object contained in the initial virtual sample.
Further, the first cross entropy loss is calculated based on the first sample predicted value and the label information of the initial virtual sample (the label information of the initial virtual sample is the same as the label information of the false sample), and the iterative disturbance vector 30c can be updated by the above formula (2); if the number of virtual samples obtained by adding (combining) the disturbance vector 30c after iteration update and the real samples of the t-1 st stage reaches a preset proportional threshold, the iteration may be stopped, and the disturbance vector 30c when the iteration is stopped may be used as the disturbance vector 30c of the t-1 st stage after final optimization, where the disturbance vector 30c may be added to the disturbance vector set 30b.
The disturbance vector set 30b may be represented in an array form, and when the disturbance vector 30c of the t-1 stage after the final optimization is obtained, a new array may be created by using an application method, where the new array may expand the array length on the length of the array corresponding to the original disturbance vector set 30c, and both the existing disturbance vector in the original disturbance vector set 30c and the disturbance vector 30c of the t-1 stage are added to the new array.
Step S102, combining the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain a virtual sample of the t stage, and obtaining the virtual sample of the t stage.
Specifically, a real sample of the t stage and a false sample of the t stage can be obtained; the false sample of the t-th stage can be the new obtained false sample, and the false sample of the t-th stage is different from the false sample of the previous t-1 stage; the actual sample of the t-th stage may be the same as or different from the actual sample of the previous t-1 stage, which is not limited in the present application. For example, all phases of the image detection model may share a real sample, i.e. the 1 st phase to the t-th phase may use the same real sample; all phases of the image detection model may use different real samples, e.g. the real samples used for each phase are different, or there may be partial overlap in the real samples used for different phases, or only partial phases of the real samples are identical, etc.
In the t stage, combining the disturbance vector in the disturbance vector set with a real sample of the t stage to obtain a virtual sample of the t stage; the virtual samples of the t-th stage may refer to virtual samples that are similar to the historical feature distribution of the previous t-1 stage. Specifically, the disturbance vector of each stage included in the disturbance vector set may be sampled to obtain a target disturbance vector, for example, the disturbance vector of any stage may be sampled from the disturbance vector set as the target disturbance vector; and then, the target disturbance vector and the real sample of the t stage can be subjected to summation operation to obtain a virtual sample of the t stage, and tag information is set for the virtual sample of the t stage.
Wherein, the virtual sample of the t-th stage can be expressed as:p n representing the sampled target disturbance vector, D r Representing a set of real samples consisting of real samples of the t-th stage,/for>Representing a set of virtual samples consisting of virtual samples of the t-th stage, n may be an integer greater than 1 and less than t; the virtual sample of the t-th stage may also be considered as a dummy sample of the t-th stage, so that the same tag information as the dummy sample may be set for the virtual sample of the t-th stage, for example, the tag information of the real sample of the t-th stage may be set to 0, and the tag information of both the dummy sample and the virtual sample of the t-th stage may be set to 1.
And step S103, correcting the network parameters of the image detection model trained in the t-1 stage according to the real sample in the t stage, the virtual sample in the t stage and the false sample in the t stage to obtain a target detection model.
Specifically, in the training of the t-th stage, the real sample of the t-th stage, the virtual sample of the t-th stage and the false sample of the t-th stage can be used as sample data for training the image detection model of the t-th stage; through the real sample, the virtual sample and the false sample of the t stage, the image detection model of the t-1 stage can be subjected to joint training in the t stage, network parameters of the image detection model trained in the t-1 stage of iteration are continuously updated, and when the training of the image detection model in the t stage reaches the iteration stop condition, the image detection model at the moment can be used as a target detection model trained in the t stage.
The iteration stop condition may be a preset maximum iteration number, and in the training of the t-th stage, the training number and the maximum iteration number may be preset. For example, assuming that the training number epoch of the t-th stage (one epoch may be considered as a complete training of the real sample, the virtual sample, and the dummy sample of the t-th stage) is set to 5 and the maximum iteration number is set to 10000, one training in the t-th stage requires iterating 2000 times, and complete training is performed on all sample data composed of the real sample, the dummy sample, and the dummy sample of the t-th stage for 5 times, and the image detection model subjected to the complete training for 5 times is used as the target detection model subjected to the training of the t-th stage.
Optionally, the target detection model after the training in the t stage may be online in the application client, that is, the target detection model after the training in the t stage may be formally put into use in the application client, and the target detection model may be used to detect the authenticity of the target object contained in the source image. In other words, for a source image including a target object in an application client, the source image may be input into a target detection model, through which a predicted probability value corresponding to the source image may be output, where the greater the predicted probability value, the greater the probability that the target object in the source image is a false object.
It will be appreciated that the target detection model trained in the t-th stage may be applied to video data containing the target object; after the computer equipment obtains the video data containing the target object, the video data can be subjected to framing processing to obtain an initial video frame sequence corresponding to the video data, further, the video frames containing the target object can be screened out from the initial video frame sequence, and the video frames containing the target object are combined into the target video frame sequence according to the time sequence. And detecting the video frames in the target video frame sequence through the target detection model trained in the t-th stage, so that a prediction probability value corresponding to each video frame in the target video frame sequence can be obtained, the authenticity of the target object in each video frame contained in the target video frame sequence can be determined based on the prediction probability value, and the larger the prediction probability value is, the larger the probability that the target object contained in each video frame is a false object is represented.
In the embodiment of the application, an image detection model trained in the t-1 stage can be obtained, and a disturbance vector set corresponding to the image detection model is obtained, wherein disturbance vectors in the disturbance vector set are used for representing the historical characteristic distribution of the image detection model in the t-1 stage, different stages correspond to different false samples, and t is an integer greater than 1; the disturbance vector in the disturbance vector set and the real sample of the t stage can be combined to obtain a virtual sample of the t stage, and the obtained virtual sample can keep the historical characteristic distribution of the image detection model during the training of the previous stage and reduce the storage space of the image detection model during the training process; and correcting network parameters of the image detection model trained in the t-1 stage through the real sample, the false sample and the virtual sample in the t stage, so that a target detection model trained in the t stage can be obtained. In other words, in the training process of the t-th stage, when the new false sample in the t-th stage is trained, the training process is performed together with the virtual sample of the t-th stage, so that the historical characteristic distribution of the previous t-1 stages can be reserved, and the detection accuracy of the image detection model can be improved.
Referring to fig. 5, fig. 5 is a flowchart illustrating another image data detection method according to an embodiment of the present application; it will be appreciated that the image data detection method may be performed by a computer device, which may be a server, or may be a terminal device, or may be a computer program (including program code), to which the application is not limited. As shown in fig. 5, the video image data detection method may include the following steps S201 to S210:
step S201, acquiring an image detection model trained in the t-1 stage, and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vectors in the disturbance vector set are used for representing the historical characteristic distribution of the image detection model in t-1 stages, and different stages correspond to different false samples.
Step S202, combining the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain a virtual sample of the t stage, and obtaining the virtual sample of the t stage.
The specific implementation process of step S201 and step S202 may refer to step S101 to step S102 in the embodiment corresponding to fig. 3, and will not be described herein.
Step S203, determining the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage as a sample data set of the t stage, and outputting a second sample predicted value corresponding to the sample data in the sample data set through the image detection model trained by the t-1 stage.
Specifically, in the training of the t-th stage, the real sample, the virtual sample, and the false sample of the t-th stage can be determined as the sample data set of the t-th stage (can be denoted as D t ) The method comprises the steps of carrying out a first treatment on the surface of the All the true samples of the t-th stage can be combined into a set of true samples of the t-th stage (which can be denoted as D r ) All the virtual samples of the t-th stage are formed into a virtual sample set (which can be written as) All the dummy samples of the t-th stage are combined into a set of dummy samples (which can be denoted +.>)。
The image detection model at the t-th stage may be denoted as a model f t The image detection model trained in the t-1 stage can be used as the initial state of the image detection model in the t-1 stage, namely the image detection model trained in the t-1 stage can be used as the model f t Is set in the initial state of (2). All sample data in the sample data set of the t-th stage (the real sample, the virtual sample and the dummy sample of the t-th stage can be regarded as sample data in the sample data set) can be input to the model f in batches t In (c) through a model f t The second sample prediction value corresponding to each sample data in the sample data set may be output.
Since the image detection model trained in the t-1 stage is the initial state of the image detection model of the t stage, when the training of the image detection model is started in the t stage, the sample data in the sample data set is essentially input into the image detection model trained in the t-1 stage, and the image detection model trained in the t-1 stage outputs the second sample predicted value corresponding to the sample data in the sample data set. It will be appreciated that the image detection model of the t-th stage (i.e. model f t ) The network parameters of (a) are continuously updated and iterated, and sample data in the sample data set can be input into a model f containing the latest network parameters t From model f t And outputting a second sample predicted value corresponding to the sample data in the sample data set.
Wherein the image detection model of the t-th stage (model f t ) May include a feature extraction component g t Full connection layer and classification layer, feature extraction component g t Can be used for extracting the characteristics of sample data in a sample data set, and the full connection layer is used for extracting the characteristics of the component g t Which may be referred to as sample description features, where the sample description features may be used to characterize the characteristic information of the target object contained in the sample data) into a target feature vector, and a classification layer may be used to output the target feature vectorAnd a corresponding second sample predictor. Feature extraction component g t The embodiment of the application does not limit the type of the feature extraction component, and the convolutional neural network, the wavelet scattering network or the residual network can be adopted.
Step S204, determining a second cross entropy loss of the t-th stage according to the total number of samples in the sample data set, the second sample predicted value corresponding to the sample data in the sample data set, and the label information corresponding to the sample data in the sample data set.
Specifically, the difference between the target constant (the target constant may take a value of 1 here) and the second sample predicted value corresponding to the sample data in the sample data set may be subjected to logarithmic processing, to obtain a first logarithmic value; determining a difference value between the target constant and label information corresponding to sample data in the sample data set as a label difference value, and determining a product between the label difference value and a first logarithmic value as a first product value; carrying out logarithmic processing on the sum of the first product value and a second sample predicted value corresponding to sample data in the sample data set to obtain a second logarithmic value; accumulating the product of the label information corresponding to the sample data in the sample data set and the second logarithmic value to obtain a sample accumulated value corresponding to the sample data set; a second cross entropy penalty for the t-th stage is determined based on the ratio between the sample accumulation value and the total number of samples in the sample dataset. Wherein a second cross entropy loss is associated with all sample data in the sample data set of the t-th stage, the second cross entropy loss being shown in the following formula (3):
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a second cross entropy loss of the t-th phase, |D t I represents the sample dataset D of the t-th phase t Total number of samples, x t Sample data set D representing phase t t Is the sample data of y t Sample data set D representing phase t t Sample data x in (a) t Corresponding tag information, f t (x t ) Sample data x representing the t-th stage t And a corresponding second sample predictor. log [1-f t (x t )]Representing a first logarithmic value, (1-y t ) Representing the label difference, (1-y) t )log[1-f t (x t )]Represents a first product value, log { f t (x t )+(1-y t )log[1-f t (x t )]And second logarithmic value.
Step S205, determining the virtual entropy loss of the t-th stage according to the number of virtual samples in the sample data set, the second sample predicted value corresponding to the virtual samples in the sample data set, and the label information corresponding to the virtual samples in the sample data set.
Specifically, the second sample predicted value corresponding to the virtual sample in the sample data set may be subjected to logarithmic processing to obtain a third logarithmic value; accumulating the product of the label information corresponding to the virtual sample in the sample data set and the third logarithmic value to obtain a virtual accumulated value corresponding to the sample data set; and obtaining the virtual entropy loss of the t-th stage according to the ratio between the virtual accumulated value and the virtual sample number in the sample data set. Wherein the virtual entropy loss is associated with the virtual sample of the t-th stage, the virtual entropy loss can be represented by the following formula (4):
Wherein E is t Representing the virtual entropy loss of the t-th phase,virtual sample set representing phase t +.>Virtual sample number in->Virtual sample representing phase t, +.>Tag information corresponding to the virtual sample representing the t-th stage,>virtual sample representing phase t +.>A corresponding second sample predictor; />Representing a third logarithmic value.
Step S206, determining the virtual mean square error loss of the t stage according to the number of the virtual samples in the sample data set, the sample description characteristics of the virtual samples in the sample data set at the t stage and the sample description characteristics of the virtual samples in the sample data set at the t-1 stage.
Specifically, the sample description feature of the virtual sample in the sample data set at the t-1 stage can be output through the image detection model of the t-1 stage, and the sample description feature of the t-1 stage can be referred to as the image detection model (model f t-1 ) Feature extraction component g of (1) t-1 Is provided. Outputting the sample description characteristic of the virtual sample in the sample data set at the t-th stage by the image detection model of the t-th stage, wherein the sample description characteristic of the t-th stage can be referred to as an image detection model (model f t ) Feature extraction component g of (1) t Is provided. Further, sample description characteristics of the virtual sample in the sample data set at the t-th stage and virtual sample errors between the sample description characteristics of the virtual sample in the sample data set at the t-1 th stage can be obtained; determining the t-th order according to the virtual sample error corresponding to the virtual sample in the sample data set and the number of the virtual samples in the sample data setVirtual mean square error loss of segments. In other words, in the training of the t-th stage, the feature extraction component g in the image detection model of the t-1 st stage may be used t-1 Knowledge distillation is performed on the sample description characteristics corresponding to the virtual samples of the t-th stage, the virtual mean square error loss of the t-th stage is associated with the virtual samples in the sample data set of the t-th stage, and the virtual mean square error loss of the t-th stage can be shown in the following formula (5):
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the virtual mean square error loss of the t-th phase, a +.>Sample description feature representing virtual sample of the t-th stage in the t-th stage, +.>Sample description feature of virtual sample representing the t-th phase at the t-1 th phase,/->Representing the virtual sample error between the two stages, the expression. L2 norm. Feature extraction component g in an image detection model by using stage t-1 t-1 Knowledge distillation is carried out on the virtual sample in the t stage, so that the feature distribution of the virtual sample in the t stage can be ensured not to be influenced by the change of the image detection model.
Step S207, determining the real mean square error loss of the t stage according to the sample description characteristic of the real sample in the sample data set at the t stage and the sample description characteristic of the real sample in the sample data set at the t-1 stage.
Specifically, the sample description characteristics of the real sample in the sample data set in the t-1 stage can be output through the image detection model in the t-1 stage; outputting sample description characteristics of real samples in a sample data set in a t stage through an image detection model in the t stage; acquiring sample description characteristics of real samples in a sample data set at a t-th stage and real sample errors of the real samples in the sample data set between the sample description characteristics of the real samples in the t-1 th stage; and accumulating the real sample errors corresponding to the real samples in the sample data set to obtain the real mean square error loss of the t stage. Wherein the true mean square error loss of the t-th stage is associated with the true samples in the sample data set of the t-th stage, the true mean square error loss of the t-th stage can be represented by the following formula (6):
Wherein, the liquid crystal display device comprises a liquid crystal display device,represents the true mean square error loss, g, at stage t t (x r ) Sample description feature, g, representing the actual sample of the t-th stage at the t-th stage t-1 (x r ) Sample description feature, g, representing real sample at t-1 th stage t (x r )-g t-1 (x r ) Representing the true sample error between the two stages. Feature extraction component g in an image detection model by using stage t-1 t-1 The knowledge distillation is carried out on the real sample in the t stage, so that the stability of the characteristic distribution of the real sample can be maintained, and the characteristic distribution of the virtual sample in the t stage can be kept consistent with the history characteristic distribution in the previous t-1 stage through the stability of the characteristic distribution of the real sample.
And step S208, determining model loss of the t stage according to the second cross entropy loss, the virtual mean square error loss and the real mean square error loss, and correcting network parameters of the image detection model trained in the t-1 stage based on the model loss of the t stage to obtain a target detection model.
Specifically, a balance coefficient associated with the virtual mean square error loss and the real mean square error loss can be obtained, and the product between the virtual mean square error loss and the real mean square error loss and the balance coefficient is determined as the sample mean square error loss of the t-th stage; and carrying out summation operation on the second cross entropy loss, the virtual entropy loss and the sample mean square error loss to obtain model loss in the t-th stage. Wherein the model loss can be represented by the following formula (7):
Wherein beta represents a virtual mean square error lossAnd true mean square error loss->Associated balancing coefficients for balancing the virtual mean square error loss +.>And true mean square error loss->The sum can be set manually according to actual requirements, and the application does not limit the value of beta. />
The model loss represented by the formula (7) can be used for training the t-1 st stage of the image detection model, and the model loss L is used for training the t-1 st stage t The network parameters of the image detection model trained in the t-1 stage can be updated and iterated continuously by minimizing, and when the training of the image detection model in the t stage reaches the iteration stop condition, the image detection model at the moment can be used as a target detection for finishing the training in the t stageAnd (5) testing the model.
Referring to fig. 6, fig. 6 is a training schematic diagram of an image detection model according to an embodiment of the present application; as shown in fig. 6, during the training process of the t-th stage, a real sample set 40e and a false sample set 40c of the t-th stage may be acquired; the disturbance vector set 40a corresponding to the first t-1 stages may also be obtained, the disturbance vectors of each stage included in the disturbance vector set 40c may be sampled, and the target disturbance vector 40b may be determined from the disturbance vector set 40 c. Further, the target disturbance vector 40b may be added to the real samples in the real sample set 40e to obtain a virtual sample set 40d at the t-th stage.
Further, sample data in the real sample set 40e, the dummy sample set 40c, and the dummy sample set 40d can be input to the feature extraction component g in the image detection model of the t-th stage t The real sample characteristics corresponding to the real samples in the real sample set 40e, the dummy sample characteristics corresponding to the dummy samples in the dummy sample set 40c, and the virtual sample characteristics corresponding to the virtual samples in the virtual sample set 40d can be obtained. Sample data in the real sample set 40e and the virtual sample set 40d may be input to the feature extraction component g in the image detection model of the t-1 st stage t-1 Also, the real sample characteristics corresponding to the real samples in the real sample set 40e and the virtual sample characteristics corresponding to the virtual samples in the virtual sample set 40d can be obtained. It is appreciated that the aforementioned dummy sample features, virtual sample features, and real sample features may all be referred to as sample description features.
Wherein the feature extraction component g t After the output false sample feature, virtual sample feature and real sample feature pass through the full connection layer and the classification layer in the image detection model, a second sample prediction value can be output, and the second cross entropy loss of the t-th stage can be calculated through the second sample prediction value and the label information of the sample data in the real sample set 40e, the false sample set 40c and the virtual sample set 40d, as shown in the foregoing formula (3). Through the feature extraction component g t Output virtual sample features and realitySample features and feature extraction component g t-1 The output virtual sample characteristics and real sample characteristics can calculate the mean square error loss (which can include the virtual mean square error loss and the real mean square error loss, as shown in the foregoing formula (5) and formula (6)), and further can train the image detection model in the t-th stage through the second cross entropy loss, the mean square error loss and the virtual entropy loss corresponding to the virtual sample set 40d (as shown in the formula (4)) so as to obtain the target detection model.
It will be appreciated that in the training of the t-th stage, similar to the first t-1 stages, it is also necessary to preserve the sample feature distribution in the t-th stage, i.e. to calculate the disturbance vector of the t-th stage; the optimization mode of the disturbance vector of the t-th stage can be referred to the foregoing formula (1) and formula (2), and the optimization modes of the disturbance vectors of the respective stages can be the same, except that the optimization of the disturbance vector of the respective stages is associated with the image detection model trained by the stage and the true sample of the stage; for example, the optimization process of the disturbance vector of the t-th stage is associated with the target detection model trained by the t-th stage and the true samples of the t-th stage. Based on the optimization modes corresponding to the formula (1) and the formula (2), after the disturbance vector finally optimized in the t-th stage is obtained, the disturbance vector finally optimized can be used as the sample characteristic distribution of the t-th stage to be added into a disturbance vector set. The disturbance vector of each stage (including the disturbance vector of the t-th stage) is added in the disturbance vector set, so that the sample characteristic distribution of each stage can be reserved, the model training of the subsequent stage is facilitated, and the forgetting problem of the image detection model in the training of the subsequent stage is relieved.
Step S209, a source image containing a target object is acquired, the source image is input into a target detection model, and object description features corresponding to the source image are output through a feature extraction component in the target detection model.
Specifically, the target detection model trained in the present t stage may be online in various application products related to image detection, such as an application client, an applet, a website, etc., and in the application products, the image or video containing the target object in the application products may be detected by using the target detection model trained in the t stage, so as to determine the authenticity of the target object contained in the image or video.
In a detection scene of an application product, image data to be detected can be acquired, and the image data is used as a source image; or the video data to be detected can be obtained, and the video data is subjected to framing treatment to obtain a video frame sequence corresponding to the video data, wherein each video frame in the video frame sequence can be used as a source image. The computer equipment can input the source image into a target detection model trained in the t stage, and can perform feature extraction on the source image through a feature extraction component in the target detection model to obtain an object description feature corresponding to the source image, wherein the object description feature refers to an output feature of the source image after passing through the feature extraction component in the target detection model. The extraction process of the object description features is similar to that of the sample description features, and will not be described here again.
In one or more embodiments, after the computer device acquires the source image, it may first detect whether a target object exists in the source image, and if it detects that the target object does not exist in the source image, it may discard the source image, without performing subsequent processing on the source image; if the target object is detected to exist in the source image, the object area where the target object is located in the source image can be determined, and the object area is cut out from the source image. The embodiment of the application may detect whether the target object exists in the source image by using a target detection technology, where the target detection technology may include, but is not limited to: SSD (Single Shot MultiBox Detector, a general object detection algorithm) and YOLO (You Only Look Once, a target detection model) models, etc.; for example, assuming that the target object is a human face, DSFD (Dual Shot Face Detector, a human face detection network), MTCNN (Multi-task convolutional neural network ), or other face detection techniques may be employed to detect whether a human face is present in the source image.
After cutting out an object region containing a target object from a source image, the object region can be directly input into a target detection model trained in the t-th stage; the size of the object region may be adjusted, for example, the size of the object region may be enlarged by 1.2 times, or the object region in the source image may be enlarged by 1.2 times with the object region as the center, and the object region including the target object may be cut out from the enlarged source image. And through a feature extraction component in the target detection model, feature extraction can be carried out on the object region, and object description features corresponding to the source image are obtained.
Step S210, identifying object description features through a classifier in the target detection model, and outputting predicted probability values corresponding to the object description features; and determining a detection result corresponding to the target object contained in the source image according to the prediction probability value.
Specifically, the target detection model trained in the t-th stage may include a feature extraction component, a fully connected layer, and a classification layer (which may also be referred to as a classifier, for example, the classification layer may be a softmax layer). The feature extraction component in the target detection model outputs object description features and can input the object description features into a full-connection layer, and the input object description features can be subjected to dimension reduction through the full-connection layer to obtain target feature vectors; and the output result (target feature vector) of the full-connection layer can be identified through a classification layer in the target detection model, and a prediction probability value corresponding to the object description feature can be obtained. If the predicted probability value is greater than or equal to the probability threshold value, determining that a detection result corresponding to the target object contained in the source image is a false object; if the predicted probability value is smaller than the probability threshold value, determining that a detection result corresponding to a target object contained in the source image is a real object, wherein the larger the predicted probability value is, the larger the probability that the target object in the source image is a false object is; the probability threshold may be set according to actual requirements, which is not limited by the present application.
It will be appreciated that after the training of the t-th stage is completed, the target detection model may be subsequently developed to generate some new types of false data with the development of computer vision technology, such as false data that the target detection model has no reference to in the training of the first t-th stage, and then these new types of false data may be used as false samples of the t+1th stage. Further, any one disturbance vector may be sampled from disturbance vectors of the first t stages included in the disturbance vector set, and the sampled disturbance vector may be added to a real sample of the (t+1) -th stage (the sampled disturbance vector may be the same as a real sample of any one of the first t stages, or an image including a real object may be used as a real sample of the (t+1) -th stage, which is not limited in the present application), so as to construct a virtual sample of the (t+1) -th stage. The false sample of the t+1th stage, the virtual sample of the t+1th stage and the real sample of the t+1th stage can be utilized to continuously perform joint training on the target detection model trained in the t stage, namely, the network parameters of the target detection model in the t stage are adjusted to obtain the image detection model trained in the t+1th stage.
After the t+1th stage training is finished, the target detection model trained in the t stage in the application product can be offline, for example, the target detection model trained in the t stage can be set to be in a forbidden use state in the application product, and then the image detection model trained in the t+1th stage can be online in the application product, namely, the image detection model trained in the t+1th stage is replaced by the target detection model in the application product to be online; that is, the online image detection model in the application product can be ensured to be continuously optimized and updated, and the online image detection model in the application product is always kept to be the latest trained model, so that the detection accuracy of the image detection model can be improved.
Optionally, after the training of the t+1th stage is completed, the image detection model trained in the t+1th stage can be online in the application product, and meanwhile, the target detection model trained in the t stage on the online in the application product can be reserved.
In one or more embodiments, the target detection model trained by the embodiments of the present application may be applied to false object detection products, such as face verification, where the collected face video or image may be input to the target detection model trained in the t-th stage, and a predicted probability value corresponding to the face video or image may be output by the target detection model, where the predicted probability value may be used to indicate a probability that a face in the face video or image is replaced, where a larger predicted probability value indicates a larger probability that a face in the face video or image is replaced. In which a face in a facial video or image is replaced by a face that is replaced by another face, such as a face of original object a by a face of object B, using some specific technique (e.g., AI (artificial intelligence) face replacement technique, etc.). By applying the target detection model to the face verification identity scene, the identity authentication security can be improved.
Referring to fig. 7, fig. 7 is a schematic diagram of a detection scene based on a target detection model according to an embodiment of the present application, and the embodiment of the present application is described by taking a face verification identity scene as an example. It will be appreciated that when the object a purchases the object of his own heart instrument on the shopping platform or in the physical store, payment can be made by using a face payment method, wherein the face method means that the face image of the object a needs to be acquired in real time to check whether the acquired face image is the object a itself. Alternatively, when the object a needs to view specific information (e.g., identity information about the object a, etc.) in the application client, identity authentication may be required, such as capturing a face image or video of the object a in real time to verify whether the captured face image or video is the object a itself. In the above situations or other similar situations requiring face payment and face identity authentication, the face image or video acquired can be detected by using the trained target detection model whether the face image or video is the true face of the user.
As shown in fig. 7, if an object a wants to use a terminal device 50a (which may be regarded as a computer device) to log in an information system to view personal related information, the object a needs to perform identity authentication first, and only if the object a passes the identity authentication, the object a can access the information system to perform information query. When the object a inputs a web address of the information system in the browser of the terminal device 50a, or starts an applet, an application client, or the like corresponding to the information system, an authentication page 50c may be displayed in the terminal device 50a, and the authentication page 50c may include a scan area 50b, and the scan area 50b may be used to collect a facial image of the object a. In this case, a prompt message, such as "align face to scan area", may be displayed in the authentication page 50c, where the prompt message is used to instruct the subject a how to perform authentication.
The object a faces the face to the scanning area 50b in the identity authentication page 50c, and adjusts the distance between the object a and the terminal device 50a, so that the scanning area 50b can collect the complete face of the object a. During the acquisition of the facial image of subject a in the scan area, verification cues may also be displayed in the identity authentication page 50c, which may include, but are not limited to, certain actions such as blinking, nodding, waving, opening the mouth, waving to the right, waving to the left, etc.; object a may act accordingly in accordance with the verification prompt in authentication page 50 c.
If the terminal device 50a collects the face image 50d through the scanning area 50b, the online target detection model may be invoked, the collected face image 50d may be input into the target detection model, the face image 50d may be detected through the target detection model, a prediction probability value corresponding to the face image 50d may be output, and if the prediction probability value is greater than or equal to a preset probability threshold, it may be determined that the face in the collected face image 50d is not the real face of the object a but the replaced false face; that is, the collected face image 50d may not be the subject a itself, or the face image 50d is synthesized by a synthesis technique, etc., so that it may be determined that the subject a fails the authentication of the information system, and a prompt message indicating that the authentication fails may be displayed in the authentication page 50c to prompt the subject a to perform the authentication again.
If the predicted probability value is smaller than the preset probability threshold value, the face in the collected face image 50d can be determined to be the real face of the object A, which indicates that the object A passes the identity authentication of the information system; and then the information system can be accessed, and the content which needs to be checked can be inquired in the information system. The object to be accessed to the information system is subjected to identity authentication through the target detection model, so that the security of the identity authentication can be improved.
In one or more embodiments, the target detection model may also be applied in a evidence counterfeit detection scene, and the evidence provided by the user is detected through the target detection model, so that the authenticity of the evidence provided by the user can be determined, the user can be prevented from making false evidence by using a deep counterfeit technology, and the authenticity and fairness of the evidence can be improved.
In one or more embodiments, the object detection model may also be applied in a photo video authentication scene. In current multimedia platforms (e.g., various video clients, short video clients, news clients, etc.), there may be a large number of false videos or images that are included, for example, if a user is interested in getting attention, the face of object a in the original video may be changed to the face of object B, false videos may be made, and distributed to the multimedia platform, and propagation of these false videos may cause a decrease in the public confidence of the media and may cause misguidance to the user of the multimedia platform. Therefore, the video or the image in the multimedia platform can be screened through the target detection model, and the mark information (for example, the mark information can be a virtual object, manufactured by AI technology or the like) added to the detected false video or image, so that the credibility of the image or the video in the multimedia platform can be improved.
Referring to fig. 8, fig. 8 is a schematic diagram of a data authentication scene of an object detection model according to an embodiment of the application. As shown in fig. 8, the computer device may acquire multimedia data (which may include image data and video data in the multimedia platform) in the multimedia platform, perform preliminary screening on the acquired multimedia data, and screen multimedia data including a face from the multimedia platform; for example, all video data and image data including a face region of a person may be screened from a multimedia platform, such as a variety video, an entertainment news video, a television show video, a movie video, a person photo, a person street shot, a self-photograph, and the like. As shown in fig. 8, multimedia data including a face in the multimedia platform may include image data 60a, video data 60b, image data 60c, video data 60d, and the like.
Further, the initially screened multimedia data may be sequentially input to the target detection model, and passed through the feature extraction component g in the target detection model t The full connection layer and the classification layer can output the prediction probability value corresponding to each multimedia data respectively, and whether the face contained in each multimedia data is replaced or not can be determined according to the prediction probability value. As shown in fig. 8, if the predicted probability value of the image data 60a output by the object detection model is the probability value a1, and if the probability value a1 is greater than the predetermined probability threshold value, it is determined that the face in the image data 60a is replaced, that is, the face in the image data 60a is a false face, and the image data 60a may be provided with the flag information "created by AI technology", by which the user is presented that the image data 60a is synthesized by Artificial Intelligence (AI) technology. Of course, if the probability value a1 is smaller than the predetermined probability threshold, it may be determined that the face in the image data 60a is not replaced and is a real face, and then the tag information does not need to be added to the image data 60 a.
The predicted probability value of the video data 60b output by the target detection model is a probability value a2, if the probability value a2 is smaller than a preset probability threshold value, it can be determined that the face in the video data 60b is not replaced and is a real face, and then no marking information needs to be added to the video data 60 b. The predicted probability value of the image data 60c output by the target detection model is a probability value a3, if the probability value a3 is smaller than a preset probability threshold value, it can be determined that the face in the image data 60c is not replaced and is a real face, and then no marking information is required to be added to the image data 60 c.
If the predicted probability value of the video data 60d output by the object detection model is the probability value a4, it is determined that the face in the video data 60d is replaced, that is, a dummy face exists in the video data 60d, and tag information "created by AI technology" may be added to the video data 60d, and the user may be prompted by the tag information that the video data 60d is synthesized by Artificial Intelligence (AI) technology. It will be appreciated that if the video data 60d includes faces of a plurality of objects, and only a face of one object in the video data 60d is replaced, when the tag information is added to the video data 60d, an "xx object" may be added to the video data 60d and marked by AI technology, where the xx object is the object in the video data 60d with the face replaced.
By utilizing the target detection model, false video data or image data in the multimedia platform are screened out, and marking information is added to the false video data or image data, real data (such as multimedia data without face replacement) and false data (such as multimedia data with face replacement) in the multimedia platform can be better distinguished, and the credibility of video or image content in the multimedia platform can be improved.
In the embodiment of the application, an image detection model trained in the t-1 stage can be obtained, and a disturbance vector set corresponding to the image detection model is obtained, wherein disturbance vectors in the disturbance vector set are used for representing the historical characteristic distribution of the image detection model in the t-1 stage, different stages correspond to different false samples, and t is an integer greater than 1; the disturbance vector in the disturbance vector set and the real sample of the t stage can be combined to obtain a virtual sample of the t stage, and the obtained virtual sample can keep the historical characteristic distribution of the image detection model during the training of the previous stage and reduce the storage space of the image detection model during the training process; and correcting network parameters of the image detection model trained in the t-1 stage through the real sample, the false sample and the virtual sample in the t stage, so that a target detection model trained in the t stage can be obtained. In other words, in the training process of the t-th stage, when the new false sample in the t-th stage is trained, the training process is performed together with the virtual sample of the t-th stage, so that the historical characteristic distribution of the previous t-1 stages can be reserved, and the detection accuracy of the image detection model can be improved.
It will be appreciated that in particular embodiments of the present application, it may be relevant to the video/image of the user's location (e.g., the video/image of the user's face, etc.), and that when the above embodiments of the present application are applied to particular products or technologies, permission or consent from the user's etc. subject needs to be obtained, and the collection, use and processing of relevant data needs to comply with relevant laws and regulations and standards of the relevant country and region.
Referring to fig. 9, fig. 9 is a schematic structural diagram of an image data detecting device according to an embodiment of the present application. As shown in fig. 9, the video image data detection apparatus 1 includes: the system comprises a set acquisition module 11, a sample construction module 12 and a parameter correction module 13;
the set acquisition module 11 is used for acquiring an image detection model trained in the t-1 stage and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vector in the disturbance vector set is used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1;
the sample construction module 12 is configured to combine the disturbance vector in the disturbance vector set with a real sample of a t stage to obtain a virtual sample of the t stage, and obtain a false sample of the t stage;
The parameter correction module 13 is configured to correct network parameters of the image detection model trained in the t-1 th stage according to the real sample in the t-1 th stage, the virtual sample in the t-1 th stage, and the false sample in the t-th stage, so as to obtain a target detection model; the target detection model is an image detection model trained in the t-th stage, and is used for detecting the authenticity of a target object contained in the source image.
The specific functional implementation manners of the set obtaining module 11, the sample constructing module 12, and the parameter correcting module 13 may refer to steps S101 to S103 in the embodiment corresponding to fig. 3, which are not described herein.
In one or more embodiments, the set acquisition module 11 includes: a disturbance vector initialization unit 111, a virtual sample prediction unit 112, and a disturbance vector update unit 113;
a disturbance vector initializing unit 111, configured to obtain an initial disturbance vector of the image detection model in a t-1 th stage of the t-1 th stages, and combine the initial disturbance vector of the t-1 th stage and a real sample of the t-1 th stage into an initial virtual sample;
a virtual sample prediction unit 112, configured to input an initial virtual sample to the image detection model of the t-1 st stage, and output a first sample prediction value corresponding to the initial virtual sample through the image detection model of the t-1 st stage;
The disturbance vector updating unit 113 is configured to correct the initial disturbance vector according to the first sample predicted value and the label information corresponding to the real sample of the t-1 st stage, obtain a disturbance vector of the t-1 st stage, and add the disturbance vector of the t-1 st stage to the disturbance vector set.
Alternatively, the disturbance vector update unit 113 may be specifically configured to:
determining label information corresponding to an initial virtual sample according to label information corresponding to a real sample in the t-1 stage;
determining a first cross entropy loss corresponding to the initial virtual sample according to the first sample predicted value and label information corresponding to the initial virtual sample;
and obtaining a gradient value of the first cross entropy loss, and carrying out iterative updating on the initial disturbance vector based on the initial disturbance vector and the gradient value of the first cross entropy loss to obtain a disturbance vector of the t-1 stage.
The specific functional implementation manners of the disturbance vector initialization unit 111, the virtual sample prediction unit 112, and the disturbance vector update unit 113 may refer to step S101 in the embodiment corresponding to fig. 3, and will not be described herein.
In one or more embodiments, the sample construction module 12 includes: a disturbance vector sampling unit 121, a virtual sample construction unit 122;
A disturbance vector sampling unit 121, configured to sample disturbance vectors of each stage included in the disturbance vector set, so as to obtain a target disturbance vector;
the virtual sample construction unit 122 is configured to perform a summation operation on the target disturbance vector and the real sample of the t stage, obtain a virtual sample of the t stage, and set tag information for the virtual sample of the t stage.
The specific functional implementation of the disturbance vector sampling unit 121 and the virtual sample constructing unit 122 may refer to step S102 in the embodiment corresponding to fig. 3, and will not be described herein.
In one or more embodiments, the parameter modification module 13 includes: a sample data prediction unit 131, a cross entropy loss determination unit 132, a virtual entropy loss determination unit 133, a first error loss determination unit 134, a second error loss determination unit 135, a model loss determination unit 136;
the sample data prediction unit 131 is configured to determine a real sample of the t-th stage, a virtual sample of the t-th stage, and a false sample of the t-th stage as a sample data set of the t-th stage, and output a second sample prediction value corresponding to sample data in the sample data set through an image detection model trained in the t-1-th stage;
A cross entropy loss determining unit 132, configured to determine a second cross entropy loss at the t-th stage according to the total number of samples in the sample data set, the second sample prediction value corresponding to the sample data in the sample data set, and the tag information corresponding to the sample data in the sample data set;
a virtual entropy loss determining unit 133, configured to determine a virtual entropy loss at the t-th stage according to the number of virtual samples in the sample data set, the second sample prediction value corresponding to the virtual samples in the sample data set, and the tag information corresponding to the virtual samples in the sample data set;
a first error loss determining unit 134, configured to determine a virtual mean square error loss at the t-th stage according to the number of virtual samples in the sample data set, the sample description feature of the virtual samples in the sample data set at the t-th stage, and the sample description feature of the virtual samples in the sample data set at the t-1 th stage;
a second error loss determining unit 135, configured to determine a true mean square error loss at the t-th stage according to the sample description feature of the true sample in the sample data set at the t-th stage and the sample description feature of the true sample in the sample data set at the t-1 stage;
The model loss determining unit 136 is configured to determine a model loss of the t-th stage according to the second cross entropy loss, the virtual mean square error loss, and the real mean square error loss, and correct network parameters of the image detection model trained in the t-1-th stage based on the model loss of the t-th stage, so as to obtain a target detection model.
Alternatively, the cross entropy loss determination unit 132 is specifically configured to:
carrying out logarithmic processing on the difference value between the target constant and the second sample predicted value corresponding to the sample data in the sample data set to obtain a first logarithmic value;
determining a difference value between the target constant and label information corresponding to sample data in the sample data set as a label difference value, and determining a product between the label difference value and a first logarithmic value as a first product value;
carrying out logarithmic processing on the sum of the first product value and a second sample predicted value corresponding to sample data in the sample data set to obtain a second logarithmic value;
accumulating the product of the label information corresponding to the sample data in the sample data set and the second logarithmic value to obtain a sample accumulated value corresponding to the sample data set;
a second cross entropy penalty for the t-th stage is determined based on the ratio between the sample accumulation value and the total number of samples in the sample dataset.
Alternatively, the virtual entropy loss determination unit 133 is specifically configured to:
carrying out logarithmic processing on a second sample predicted value corresponding to the virtual sample in the sample data set to obtain a third logarithmic value;
accumulating the product of the label information corresponding to the virtual sample in the sample data set and the third logarithmic value to obtain a virtual accumulated value corresponding to the sample data set;
and obtaining the virtual entropy loss of the t-th stage according to the ratio between the virtual accumulated value and the virtual sample number in the sample data set.
Alternatively, the first error loss determination unit 134 is specifically configured to:
outputting sample description characteristics of virtual samples in the sample data set in the t-1 stage through the image detection model of the t-1 stage;
outputting sample description characteristics of the virtual samples in the sample data set in the t stage through the image detection model of the t stage;
acquiring sample description characteristics of a virtual sample in a sample data set at a t-th stage and virtual sample errors between the sample description characteristics of the virtual sample in the sample data set at the t-1 th stage;
and determining the virtual mean square error loss of the t stage according to the virtual sample error corresponding to the virtual samples in the sample data set and the number of the virtual samples in the sample data set.
Alternatively, the second error loss determination unit 135 is specifically configured to:
outputting sample description characteristics of real samples in the sample data set in the t-1 stage through the image detection model in the t-1 stage;
outputting sample description characteristics of real samples in a sample data set in a t stage through an image detection model in the t stage;
acquiring sample description characteristics of real samples in a sample data set at a t-th stage and real sample errors of the real samples in the sample data set between the sample description characteristics of the real samples in the t-1 th stage;
and accumulating the real sample errors corresponding to the real samples in the sample data set to obtain the real mean square error loss of the t stage.
Optionally, the model loss determining unit 136 determines the model loss of the t-th stage according to the second cross entropy loss, the virtual mean square error loss, and the real mean square error loss, including:
obtaining a balance coefficient associated with the virtual mean square error loss and the real mean square error loss, and determining the product between the virtual mean square error loss and the real mean square error loss and the balance coefficient as the sample mean square error loss of the t-th stage;
And carrying out summation operation on the second cross entropy loss, the virtual entropy loss and the sample mean square error loss to obtain model loss in the t-th stage.
The specific functional implementation manner of the sample data prediction unit 131, the cross entropy loss determination unit 132, the virtual entropy loss determination unit 133, the first error loss determination unit 134, and the second error loss determination unit 135, and the model loss determination unit 136 may refer to steps S203 to S208 in the foregoing embodiment corresponding to fig. 5, which are not described herein again.
In one or more embodiments, the image data detecting apparatus 1 further includes: the device comprises a feature extraction module 14, a feature recognition module 15 and a detection result determination module 16;
the feature extraction module 14 is configured to acquire a source image containing a target object, input the source image into a target detection model, and output an object description feature corresponding to the source image through a feature extraction component in the target detection model;
the feature recognition module 15 is configured to recognize the object description feature through a classifier in the target detection model, and output a prediction probability value corresponding to the object description feature;
the detection result determining module 16 is configured to determine a detection result corresponding to the target object included in the source image according to the prediction probability value.
Optionally, the detection result determining module 16 is specifically configured to:
if the predicted probability value is greater than or equal to the probability threshold value, determining that a detection result corresponding to the target object contained in the source image is a false object;
if the predicted probability value is smaller than the probability threshold value, determining that the detection result corresponding to the target object contained in the source image is a real object.
The specific functional implementation manner of the feature extraction module 14, the feature recognition module 15, and the detection result determination module 16 may refer to steps S209 to S210 in the embodiment corresponding to fig. 5, which are not described herein.
In the embodiment of the application, an image detection model trained in the t-1 stage can be obtained, and a disturbance vector set corresponding to the image detection model is obtained, wherein disturbance vectors in the disturbance vector set are used for representing the historical characteristic distribution of the image detection model in the t-1 stage, different stages correspond to different false samples, and t is an integer greater than 1; the disturbance vector in the disturbance vector set and the real sample of the t stage can be combined to obtain a virtual sample of the t stage, and the obtained virtual sample can keep the historical characteristic distribution of the image detection model during the training of the previous stage and reduce the storage space of the image detection model during the training process; and correcting network parameters of the image detection model trained in the t-1 stage through the real sample, the false sample and the virtual sample in the t stage, so that a target detection model trained in the t stage can be obtained. In other words, in the training process of the t-th stage, when the new false sample in the t-th stage is trained, the training process is performed together with the virtual sample of the t-th stage, so that the historical characteristic distribution of the previous t-1 stages can be reserved, and the detection accuracy of the image detection model can be improved.
Further, referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer device 1000 may be a user terminal, for example, the user terminal 10a in the embodiment corresponding to fig. 1, or a server, for example, the server 10d in the embodiment corresponding to fig. 1, which is not limited herein. For ease of understanding, the present application takes a computer device as an example of a user terminal, and the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, in addition, the computer device 1000 may further comprise: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may also include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 10, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 1005, which is one type of computer-readable storage medium.
The network interface 1004 in the computer device 1000 may also provide network communication functions, and the optional user interface 1003 may also include a Display screen (Display) and a Keyboard (Keyboard). In the computer device 1000 shown in FIG. 10, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:
acquiring an image detection model trained in the t-1 stage, and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vector in the disturbance vector set is used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1;
combining the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain a virtual sample of the t stage, and obtaining a false sample of the t stage;
correcting network parameters of the image detection model trained in the t-1 stage according to the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage to obtain a target detection model; the target detection model is an image detection model trained in the t-th stage, and is used for detecting the authenticity of a target object contained in the source image.
It should be understood that the computer device 1000 described in the embodiment of the present application may perform the description of the video image data detection method in any of the embodiments of fig. 3 or fig. 5, and may also perform the description of the video image data detection apparatus 1 in the embodiment corresponding to fig. 9, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which a computer program executed by the video image data detecting apparatus 1 mentioned above is stored, and the computer program includes program instructions, when executed by a processor, capable of executing the description of the video image data detecting method in any one of the embodiments of fig. 3 or fig. 5, and therefore, a description thereof will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or, alternatively, across multiple computing devices distributed across multiple sites and interconnected by a communication network, where the multiple computing devices distributed across multiple sites and interconnected by the communication network may constitute a blockchain system.
In addition, it should be noted that: embodiments of the present application also provide a computer program product or computer program that may include computer instructions that may be stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor may execute the computer instructions, so that the computer device performs the description of the video image data detection method in any one of the foregoing embodiments of fig. 3 or fig. 5, and thus, a detailed description thereof will not be provided herein. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the computer program product or the computer program embodiments according to the present application, reference is made to the description of the method embodiments according to the present application.
The terms first, second and the like in the description and in the claims and drawings of embodiments of the application, are used for distinguishing between different media content and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method and related apparatus provided in the embodiments of the present application are described with reference to the flowchart and/or schematic structural diagrams of the method provided in the embodiments of the present application, and each flow and/or block of the flowchart and/or schematic structural diagrams of the method may be implemented by computer program instructions, and combinations of flows and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or structures.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (16)

1. An image data detection method, comprising:
acquiring an image detection model trained in the t-1 stage, and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vector in the disturbance vector set is used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1;
combining the disturbance vector in the disturbance vector set with a real sample of a t stage to obtain a virtual sample of the t stage, and obtaining the virtual sample of the t stage;
correcting network parameters of the image detection model trained in the t-1 stage according to the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage to obtain a target detection model; the target detection model is an image detection model trained in the t-th stage, and is used for detecting the authenticity of a target object contained in a source image.
2. The method of claim 1, wherein the obtaining the set of disturbance vectors corresponding to the image detection model comprises:
acquiring an initial disturbance vector of an image detection model in a t-1 stage in the t-1 stages, and combining the initial disturbance vector of the t-1 stage and a real sample of the t-1 stage into an initial virtual sample;
inputting the initial virtual sample into the image detection model of the t-1 stage, and outputting a first sample predicted value corresponding to the initial virtual sample through the image detection model of the t-1 stage;
and correcting the initial disturbance vector according to the first sample predicted value and the label information corresponding to the real sample of the t-1 stage to obtain a disturbance vector of the t-1 stage, and adding the disturbance vector of the t-1 stage to the disturbance vector set.
3. The method according to claim 2, wherein the correcting the initial disturbance vector according to the first sample predicted value and the label information corresponding to the real sample of the t-1 th stage to obtain the disturbance vector of the t-1 th stage includes:
Determining the label information corresponding to the initial virtual sample according to the label information corresponding to the real sample in the t-1 stage;
determining a first cross entropy loss corresponding to the initial virtual sample according to the first sample predicted value and the label information corresponding to the initial virtual sample;
and acquiring a gradient value of the first cross entropy loss, and carrying out iterative updating on the initial disturbance vector based on the initial disturbance vector and the gradient value of the first cross entropy loss to obtain the disturbance vector of the t-1 stage.
4. The method of claim 1, wherein the merging the disturbance vector in the disturbance vector set with the real sample of the t stage to obtain the virtual sample of the t stage comprises:
sampling disturbance vectors of each stage contained in the disturbance vector set to obtain a target disturbance vector;
and carrying out summation operation on the target disturbance vector and the real sample of the t stage to obtain the virtual sample of the t stage, and setting tag information for the virtual sample of the t stage.
5. The method according to claim 1, wherein the correcting network parameters of the t-1 th stage trained image detection model according to the real sample of the t-th stage, the virtual sample of the t-th stage, and the false sample of the t-th stage to obtain a target detection model includes:
Determining the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage as a sample data set of the t stage, and outputting a second sample predicted value corresponding to sample data in the sample data set through an image detection model trained by the t-1 stage;
determining a second cross entropy loss of the t-th stage according to the total number of samples in the sample data set, a second sample predicted value corresponding to the sample data in the sample data set and label information corresponding to the sample data in the sample data set;
determining the virtual entropy loss of the t-th stage according to the number of virtual samples in the sample data set, a second sample predicted value corresponding to the virtual samples in the sample data set and label information corresponding to the virtual samples in the sample data set;
determining the virtual mean square error loss of the t stage according to the number of the virtual samples in the sample data set, the sample description characteristics of the virtual samples in the sample data set at the t stage and the sample description characteristics of the virtual samples in the sample data set at the t-1 stage;
Determining the real mean square error loss of the t-th stage according to the sample description characteristic of the real sample in the sample data set at the t-th stage and the sample description characteristic of the real sample in the sample data set at the t-1 stage;
and determining the model loss of the t stage according to the second cross entropy loss, the virtual mean square error loss and the real mean square error loss, and correcting the network parameters of the image detection model trained in the t-1 stage based on the model loss of the t stage to obtain a target detection model.
6. The method of claim 5, wherein determining the second cross entropy loss of the t-th stage based on the total number of samples in the sample data set, the second sample prediction value corresponding to the sample data in the sample data set, and the tag information corresponding to the sample data in the sample data set, comprises:
carrying out logarithmic processing on the difference value between the target constant and the second sample predicted value corresponding to the sample data in the sample data set to obtain a first logarithmic value;
determining a difference value between the target constant and label information corresponding to sample data in the sample data set as a label difference value, and determining a product between the label difference value and the first logarithmic value as a first product value;
Carrying out logarithmic processing on the sum of the first product value and a second sample predicted value corresponding to sample data in the sample data set to obtain a second logarithmic value;
accumulating the product of the label information corresponding to the sample data in the sample data set and the second logarithmic value to obtain a sample accumulated value corresponding to the sample data set;
determining a second cross entropy loss for the t-th stage based on a ratio between the sample accumulation value and a total number of samples in the sample dataset.
7. The method of claim 5, wherein determining the virtual entropy loss of the t-th stage according to the number of virtual samples in the sample data set, the second sample prediction value corresponding to the virtual samples in the sample data set, and the tag information corresponding to the virtual samples in the sample data set, comprises:
carrying out logarithmic processing on a second sample predicted value corresponding to the virtual sample in the sample data set to obtain a third logarithmic value;
accumulating the product of the label information corresponding to the virtual sample in the sample data set and the third logarithmic value to obtain a virtual accumulated value corresponding to the sample data set;
And obtaining the virtual entropy loss of the t stage according to the ratio between the virtual accumulated value and the virtual sample number in the sample data set.
8. The method of claim 5, wherein said determining the virtual mean square error loss for the t-th phase based on the number of virtual samples in the sample data set, the sample description characteristics for the virtual samples in the sample data set at the t-th phase, and the sample description characteristics for the virtual samples in the sample data set at the t-1 th phase comprises:
outputting sample description characteristics of the virtual samples in the sample data set in the t-1 stage through the image detection model of the t-1 stage;
outputting sample description characteristics of the virtual samples in the sample data set in the t stage through the image detection model of the t stage;
acquiring sample description characteristics of virtual samples in the sample data set at the t-th stage and virtual sample errors of the virtual samples in the sample data set between the sample description characteristics of the virtual samples in the t-1 th stage;
and determining the virtual mean square error loss of the t stage according to the virtual sample error corresponding to the virtual samples in the sample data set and the number of the virtual samples in the sample data set.
9. The method of claim 5, wherein said determining the true mean square error loss for the t-th stage based on the sample description characteristic for the true sample in the sample dataset at the t-th stage and the sample description characteristic for the true sample in the sample dataset at the t-1 stage comprises:
outputting sample description characteristics of real samples in the t-1 stage in the sample data set through the image detection model of the t-1 stage;
outputting sample description characteristics of real samples in the sample data set in the t stage through the image detection model in the t stage;
acquiring sample description characteristics of real samples in the sample data set at the t-th stage and real sample errors of the real samples in the sample data set between the sample description characteristics of the real samples in the t-1 th stage;
and accumulating the real sample errors corresponding to the real samples in the sample data set to obtain the real mean square error loss of the t stage.
10. The method of claim 5, wherein said determining the model loss of the t-th stage based on the second cross entropy loss, the virtual mean square error loss, and the real mean square error loss comprises:
Obtaining a balance coefficient associated with the virtual mean square error loss and the real mean square error loss, and determining the sum of the virtual mean square error loss and the real mean square error loss and the product between the balance coefficients as a sample mean square error loss of a t-th stage;
and carrying out summation operation on the second cross entropy loss, the virtual entropy loss and the sample mean square error loss to obtain the model loss of the t-th stage.
11. The method as recited in claim 1, further comprising:
acquiring a source image containing the target object, inputting the source image into the target detection model, and outputting object description features corresponding to the source image through a feature extraction component in the target detection model;
identifying the object description features through a classifier in the target detection model, and outputting predicted probability values corresponding to the object description features;
and determining a detection result corresponding to the target object contained in the source image according to the prediction probability value.
12. The method according to claim 11, wherein determining a detection result corresponding to the target object included in the source image according to the prediction probability value includes:
If the predicted probability value is greater than or equal to a probability threshold value, determining that a detection result corresponding to the target object contained in the source image is a false object;
and if the predicted probability value is smaller than the probability threshold value, determining that the detection result corresponding to the target object contained in the source image is a real object.
13. An image data detecting apparatus, comprising:
the set acquisition module is used for acquiring an image detection model trained in the t-1 stage and acquiring a disturbance vector set corresponding to the image detection model; the disturbance vector in the disturbance vector set is used for representing the historical characteristic distribution of the image detection model in t-1 stages, different stages correspond to different false samples, and t is an integer greater than 1;
the sample construction module is used for combining the disturbance vector in the disturbance vector set with a real sample of a t stage to obtain a virtual sample of the t stage and obtaining the virtual sample of the t stage;
the parameter correction module is used for correcting network parameters of the image detection model trained in the t-1 th stage according to the real sample of the t stage, the virtual sample of the t stage and the false sample of the t stage to obtain a target detection model; the target detection model is an image detection model trained in the t-th stage, and is used for detecting the authenticity of a target object contained in a source image.
14. A computer device comprising a memory and a processor;
the memory is connected to the processor, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the computer device to execute the method of any one of claims 1 to 12.
15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1 to 12.
16. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1 to 12.
CN202211418837.9A 2022-11-14 2022-11-14 Image data detection method, device, equipment and medium Pending CN116977812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211418837.9A CN116977812A (en) 2022-11-14 2022-11-14 Image data detection method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211418837.9A CN116977812A (en) 2022-11-14 2022-11-14 Image data detection method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116977812A true CN116977812A (en) 2023-10-31

Family

ID=88480310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211418837.9A Pending CN116977812A (en) 2022-11-14 2022-11-14 Image data detection method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116977812A (en)

Similar Documents

Publication Publication Date Title
WO2022161286A1 (en) Image detection method, model training method, device, medium, and program product
US10832069B2 (en) Living body detection method, electronic device and computer readable medium
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN107038405B (en) Method and apparatus for recognizing object and method and apparatus for training recognition model
CN111667275B (en) User identity recognition method, device, equipment and medium thereof
CN111444826A (en) Video detection method and device, storage medium and computer equipment
CN114241459B (en) Driver identity verification method and device, computer equipment and storage medium
WO2019017178A1 (en) Method and apparatus for dynamically identifying a user of an account for posting images
CN117576264B (en) Image generation method, device, equipment and medium
CN111062019A (en) User attack detection method and device and electronic equipment
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN113011254B (en) Video data processing method, computer equipment and readable storage medium
CN114359775A (en) Key frame detection method, device, equipment, storage medium and program product
CN116704269B (en) Data processing method, device, equipment and storage medium
US11659123B2 (en) Information processing device, information processing system, information processing method and program for extracting information of a captured target image based on a format of the captured image
CN111638792A (en) AR effect presentation method and device, computer equipment and storage medium
CN116798129A (en) Living body detection method and device, storage medium and electronic equipment
CN116977812A (en) Image data detection method, device, equipment and medium
US20210248615A1 (en) Method and system for digitally onboarding customers for providing one or more solutions in real-time
CN112528140A (en) Information recommendation method, device, equipment, system and storage medium
Young et al. Eyeglasses frame selection based on oval face shape using convolutional neural network
CN117079336B (en) Training method, device, equipment and storage medium for sample classification model
CN113420677B (en) Method, device, electronic equipment and storage medium for determining reasonable image
Mohinabonu et al. E-payment Systems Security Solutions Using Facial Authentication Based on Artificial Neural Networks
CN118057491A (en) Palm image recognition method, palm image recognition device, palm image recognition apparatus, palm image recognition device, palm image recognition program, and palm image recognition program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication