CN114239751A

CN114239751A - Data annotation error detection method and device based on multiple decoders

Info

Publication number: CN114239751A
Application number: CN202111654954.0A
Authority: CN
Inventors: 周水庚; 王禹博; 张吉
Original assignee: Fudan University; Zhejiang Lab
Current assignee: Fudan University; Zhejiang Lab
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-03-25

Abstract

The invention discloses a data annotation error detection method and device based on multiple decoders, wherein an image data set is established, and a semi-supervised anomaly detection neural network is trained by utilizing the image data set; reconstructing the sample data by the hidden layer characteristics generated by an encoder in the training process through a plurality of decoders to obtain a hidden layer characteristic reconstruction layer; the positive sample decoder is a common single decoder, the negative sample decoder is a special multi-channel decoder, and the design is favorable for depicting and distinguishing different properties of the positive sample and the negative sample; and carrying out anomaly detection on the image data to be detected by comparing the reconstruction effects of the two decoders on the data samples. The invention can improve the fitting ability of the negative sample decoder to the abnormal sample, thereby expanding the abnormal score difference between the normal sample and the abnormal sample and improving the abnormal detection performance.

Description

Data annotation error detection method and device based on multiple decoders

Technical Field

The invention relates to the technical field of machine learning, in particular to a data annotation error detection method and device based on multiple decoders.

Background

In the data annotation process, adverse phenomena such as annotation errors and the like caused by the inattention of annotation personnel often occur. By manually screening to determine the point of occurrence of the annotation error, a great deal of time and labor expense is incurred. Because a certain amount of correct marking data is easy to obtain or prepared, a semi-supervised anomaly detection technology can be adopted to reduce the huge expense cost caused by manual inspection of marking data.

The anomaly detection technique mentioned here is a technique of screening out an abnormal portion, i.e., a portion different from the normal mode, in given data. The missing classes that cannot be covered in the original definition, and noise points generated by external interference may all cause outliers. Generally, outliers are used to refer to these outliers, and for normal data, outliers are used to refer to them. In many scientific applications and industrial fields, anomaly detection techniques are applied, such as novelty detection in the machine vision field, new effective drug search in the biochemical pharmaceutical field, and the like.

Anomaly detection is generally considered to be a single classification task in which the scope and specific nature of the inverse classes cannot be well defined due to lack of relevant knowledge. In practical applications, it is not easy to extract negative samples and verify the authenticity of the negative samples. In other cases, there are quite a few outliers that cannot be determined and defined in advance. Based on the nature of this class of data, it is generally referred to as a novelty point or an anomaly point, whereas by analyzing and characterizing the nature of the training data, positive samples can be well described. Multi-classifiers have the inertia of being insensitive to unseen classes, and it is not feasible to attempt to solve the anomaly detection problem through a general multi-classification strategy.

Many strategies and approaches have been demonstrated in research to date to address the task of anomaly detection. In most cases, these methods can be categorized into three categories: 1) summarizing the attribute of the normal sample based on the given positive sample data, and constructing a model to depict the attribute; 2) setting a rule, and classifying the samples which do not accord with the normal definition into outliers; 3) outliers are separated out of the data set based on statistical or geometric measures of the anomalies. In general, data models may exhibit different points in the capacity of the model to fit underlying data properties, depending on the differences between the models. Most of the models which are popular at present are linear in nature, and a large characteristic of the models is that the capacity of the models is relatively limited. However, there are strategies that improve the model fitting ability by applying kernel-function techniques, but such strategies fail once applied to situations where the data dimension is high or the data volume is large.

With the continuous development of technologies for applying deep learning in many fields, it is found that fitting high-dimensional features using deep neural networks is more effective, which can be attributed to the high performance of such models in feature engineering. Although the deep learning method is suitable for extracting the characterization, in the context of an anomaly detection task, outliers (i.e., outliers) are often very difficult to collect, thereby causing an extreme category imbalance situation in the training sample data. Thus, it is conventionally not feasible to train a deep neural network in a supervised manner for anomaly detection.

As previously mentioned, the anomaly detection task may be treated as a classification task to resolve. It can be seen that some skilled workers have completed some of the efforts to try to learn a classifier, however, in most of these related efforts, attempts to build discriminant anomaly detection models that do not require thresholds have always been unsuccessful. In these works, it is generally the case that a reasonable threshold still needs to be manually screened, in which a lot of manual effort and parameter adjustment work is required. Experience in a production environment has shown that we often cannot deduce the type of anomaly and its associated attributes in advance, and therefore it is not reasonable to try to describe all anomalies with a certain threshold.

Another drawback of these prior arts is that the data environment in which the training samples are all positive samples is default, so that only positive sample data can be used in the training process, which leads to the occurrence of the over-fitting problem commonly seen in machine learning, and thus the generalization capability of the model is greatly reduced. In fact, from a macroscopic perspective, a simple strategy cannot be used, that is, a positive sample is input into a deep neural network, so that a classifier is obtained through training to solve the problem of abnormal detection, and the reason is in the process.

There are some methods in the technical practice to control the destructive effects of overfitting on the model performance, e.g. early stopping techniques are often used. However, how to skillfully determine the stopping timing is rather random, and it is generally difficult to determine the best time node for optimizing generalization performance. In the field of semi-supervised anomaly detection and solutions of related tasks, learning is the most common method at present, corresponding semi-supervised characterization learning tasks are designed according to the types of training data, and finally, abnormal samples are detected through the difference between the training speeds or the training difficulties of the abnormal samples and the normal samples.

Disclosure of Invention

In order to solve the defects of the prior art and realize the purposes of improving the fitting capacity of a negative sample decoder to an abnormal sample, expanding the abnormal score difference between a normal sample and the abnormal sample and improving the abnormal detection performance, the invention adopts the following technical scheme:

a data labeling error detection method based on a multi-decoder comprises the following steps:

s1, establishing image data sets of the positive samples and the unlabelled samples, and determining the attributes of the unlabelled samples, namely classifying the unlabelled samples into the positive samples and the negative samples; training a semi-supervised anomaly detection neural network by using an image data set;

s2, the positive sample and the unmarked sample pass through an encoder to obtain the hidden layer characteristics of the image data; the encoder is responsible for compressing the input feature code to a low-dimensional space to form a hidden layer feature;

s3, the hidden layer characteristics pass through a positive sample decoder and a negative sample decoder, the positive sample passes through the positive sample decoder to obtain the reconstruction result of the positive sample, the unmarked sample passes through the positive sample decoder and the negative sample decoder respectively to obtain the reconstruction result of the unmarked sample under the positive sample decoder and the negative sample decoder respectively, and the competitive reconstruction is carried out on the sample, so the design is beneficial to depicting and distinguishing different properties of the positive sample and the negative sample;

s4, comparing the reconstruction errors of the positive and negative sample decoders, calculating the loss of the negative sample in the unmarked sample under the minimum value of the reconstruction errors of the positive and negative sample decoders and the reconstruction error of the positive sample jointly, and training the decoder and/or the encoder;

and S5, after the training is finished, carrying out abnormity detection on the image data to be detected.

Further, in S1, the image data set is preprocessed, including performing dimensionality reduction processing, flattening processing and normalization operation on the image data, where the original image data size is large, and if the original image is directly used, the data dimensionality is too high, and thus, the data dimensionality is reduced and the features are flattened, and the normalization operation is performed to ensure the data quality.

Further, in S1, the image data is subjected to batch sequence randomization, so as to eliminate data non-uniformity caused by data collection, and all training samples are not processed simultaneously, so as to reduce the memory burden.

Further, the positive sample decoder in S3 is a single decoder, the negative sample decoder is a group of negative sample decoders, or the negative sample encoder learns the rule of the negative samples, so that the reconstruction capabilities of the two encoders on the samples can be compared, and the type attribution of the samples is determined, and the results are fused by using an attention mechanism, so that the positive samples can obtain smaller reconstruction errors in the positive sample decoder; similarly, the negative sample decoder will give smaller errors for negative samples, and the algorithm model has learned the intrinsic label property of the unlabeled sample during the training process, so there is no additional testing process.

Furthermore, the negative sample decoder adopts a multi-channel negative sample decoder, the properties of the positive samples are more uniform in consideration of the point of cluster, and the properties of the outliers, namely the negative samples, are closer to a mixed distribution, so that the distribution attribute is fitted by using a plurality of decoders, different sub-distributions in the outliers are respectively processed by the plurality of decoders through a plurality of channels, the reconstruction error can be reduced, and the performance is improved.

Further, the loss calculation after the reconstruction error is calculated in S4, the following loss function is used:

wherein,

a set of positive samples is represented, and,

a set of unlabeled samples is represented,

representing the result of the reconstruction of the positive sample set at the positive sample decoder,

indicating the negative samples in the unmarked sample set, the result of reconstruction at the positive sample decoder,

indicating negative samples in the unmarked sample set, the result of reconstruction at the negative sample decoder,jan index representing a set of negative examples,

representing the square of the norm of L2.

Further, in S4, the gradient of the loss function when the model parameter is used as the argument is obtained through a gradient descent algorithm, the iterative parameter of a forward propagation and error Back Propagation (BP) algorithm is adopted, and the process from the preprocessing of the image data set in S1 to the calculation of the loss function in S4 is repeated, so that the loss function is finally stabilized near a certain value, and then the training of the sample is completed.

Further, the erroneously labeled data in S4 appears as a decoder

Has a reconstruction error greater than that of the decoder group

The reconstruction is erroneous.

Due to the existence of a large amount of supervised positive sample data, the positive sample decoder can effectively learn the characteristics and the rules of the positive samples, so that the negative sample decoder can obtain most outlier samples. The detection result will depend on the comparative reconstruction errors of the sample points in the decoder bank. For example, assuming that there is unmarked data whose true attribute is a rendezvous point, i.e., a positive sample, the error metric between the output feature reconstructed by the positive sample decoder and the input feature is smaller than that reconstructed by the negative sample decoder. The sample will eventually be predicted as a positive sample.

A multi-decoder based data annotation error detection apparatus, comprising one or more processors, configured to implement a multi-decoder based data annotation error detection method according to any one of the above embodiments.

The invention has the advantages and beneficial effects that:

the invention can simultaneously utilize the supervision information and the unsupervised information, does not need to estimate a prior threshold value for error marking, and the structure of the multi-decoder can better describe the complex property of an error sample. Experiments show that the method can obtain good results on the problem of data annotation errors.

Drawings

Fig. 1 is an overall frame diagram of the present invention.

FIG. 2 is a flow chart of a method for detecting data labeling errors based on multiple decoders according to the present invention.

FIG. 3 is a block diagram of a data labeling error detection apparatus based on multiple decoders according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

In consideration of the high efficiency of some unsupervised outlier separation strategies, it is feasible to extract information from unlabeled data, that is, to train with the data as samples, and meanwhile, to use positive samples as supervision information can further improve the performance and effect. The invention completes training by using a semi-supervised abnormality detection method adopting multiple decoders and simultaneously using positive label samples and non-label samples as training sample data.

As shown in fig. 1 and fig. 2, a method for detecting data annotation errors based on multiple decoders includes the following steps:

s1, establishing image data sets of the positive samples and the unlabelled samples, and determining the attributes of the unlabelled samples, namely classifying the unlabelled samples into the positive samples and the negative samples; the semi-supervised anomaly detection neural network is trained using the image dataset.

The image dataset is divided into positive and unmarked samples, and the attributes of the unmarked samples are determined, i.e. the unmarked samples are classified into positive and negative samples.

The method comprises the steps of preprocessing an image data set, including dimension reduction processing, image feature flattening processing and normalization operation on the image data, wherein the size of original image data is large, if an original image is directly used, the data dimension is too high, therefore, some common image preprocessing strategies are used, for example, a convolution neural network model is pre-trained through Inception and the like to perform feature engineering, so that the data dimension is reduced, the features are flattened, and meanwhile, certain normalization operation can be performed to ensure the data quality.

The image data is subjected to batch sequence randomization processing, so that data unevenness caused by data collection is eliminated, all training samples are not processed simultaneously to reduce the memory load, and instead, one group of data is input into an algorithm model at a time, and in practice, the group size is controlled to be 32 or 64 groups.

Specifically, data to be trained is collected, including positive sample data that can be well labeled and identified as a cluster point and unlabeled sample data that is not subject to class distinction. These data need not be stored separately but are placed directly together in a blended manner. Given a set of sample sets, wherein the sample types can be divided into a sample points with common properties in a certain class of interest, namely a positive samples; and b sample points for which we have no knowledge of their properties, i.e. b unlabeled samples. Wherein, the intrinsic nature of the unmarked sample may be a positive sample or a negative sample. The specified semi-supervised anomaly detection task determines the attributes of the unlabeled samples based on the sample set as data basis, i.e. classifies the unlabeled samples as positive samples or negative samples.

As a control, an abnormality detection scheme that generally employs a supervised approach is considered. In the scheme, the training data source only contains positive sample data, but does not contain negative sample data, and the algorithm model is fitted with the training data as much as possible in a mode of training the positive sample data, so that the property and the rule of the algorithm model are captured. Thus, if negative samples occur in the test data, the error between the reconstructed result and the input data will be large.

S2, the positive sample and the unmarked sample pass through an encoder to obtain the hidden layer characteristics of the image data; the encoder is responsible for compressing the input feature code to a low-dimensional space to form a hidden layer feature.

And S3, the hidden layer characteristics pass through a positive sample decoder and a negative sample decoder, the positive sample passes through the positive sample decoder to obtain the reconstruction result of the positive sample, the unmarked sample passes through the positive sample decoder and the negative sample decoder respectively to obtain the reconstruction result of the unmarked sample under the positive sample decoder and the negative sample decoder respectively, and the competitive reconstruction is carried out on the sample.

The positive sample decoder is a single decoder, the negative sample decoder is a group of negative sample decoders or multi-channel negative sample decoders, and the reconstruction capability of the two encoders on the samples can be compared through the learning of the negative sample rule by the negative sample encoder, so that the type attribution of the samples is determined; the result is fused by using an attention mechanism, and by the design, the positive sample can obtain smaller reconstruction error in the positive sample decoder; similarly, the negative sample decoder will give smaller errors for negative samples, and the algorithm model has learned the intrinsic label property of the unlabeled sample during the training process, so there is no additional testing process.

The encoder uses a neural network capable of reducing the dimension of data, different structures such as a multilayer perceptron, CNN, LSTM and the like can be reasonably selected according to specific data situations, the same processing mode is adopted in the layer regardless of positive samples or negative samples, and the layer of encoder is adopted to convert sample points from a high-dimensional input space to a low-dimensional hidden layer characteristic space.

The decoder is used for restoring the hidden layer characteristics to the shape of the sample point input space so as to complete the operation of the decoder, and the structure of the decoder is to invert the sequence of each layer of the encoder and take the inverse operation. From the result point of view, the features of the decoder output are exactly the same shape as the features of the input encoder.

The decoder set performs similar operations as the positive sample decoder, i.e., the hidden layer features are restored to the shape of the sample point input space, except that, considering the close-in points, i.e., the positive samples are more uniform in nature, and the outliers, i.e., the negative samples are closer to a mixed distribution, so that multiple decoders are used to fit this distribution attribute, and the multiple decoders of multiple channels are used to process different sub-distributions in the outliers, respectively, which can reduce reconstruction errors and improve performance.

And S4, comparing the reconstruction errors of the positive sample decoder and the negative sample decoder, calculating the loss of the negative sample in the unmarked sample under the reconstruction error minimum value of the positive sample decoder and the negative sample decoder and the reconstruction error of the positive sample jointly, and training the decoder and/or the encoder.

And calculating loss calculation after the reconstruction error by adopting the following loss function:

wherein,

a set of positive samples is represented, and,

a set of unlabeled samples is represented,

representing the square of the norm of L2.

The gradient of the loss function when the model parameters are used as independent variables is obtained through a gradient descent algorithm, parameters are iterated through a forward propagation and error Back Propagation (BP) algorithm, and the selectable gradient descent method is wide, such as Adam and SGD which have proven reliable gradient descent algorithms.

Repeating the process from preprocessing the image data set in the step S1 to calculating the loss function in the step S4, so that the loss function is finally stabilized near a certain value, and finishing the training of the sample; the wrongly labeled data appear as a decoder

Has a reconstruction error greater than that of the decoder group

The reconstruction is erroneous.

Example (b):

according to the current technical progress, the existing data labeling error detection method has two major categories of full supervision and semi supervision forms. The method of the present invention is a complementary semi-supervised method. In this manner, a data set is first presented in which there are positive samples that are marked with a definite cluster point, and there are also other unmarked, unmarked samples. The method will eventually complement the labeling of the unlabeled sample, hence the term "complementary semi-supervised mode".

Based on the content of the invention, the specific embodiment steps are given as follows:

(1) collecting and arranging positive sample data which are well marked as grouping points and unmarked data which are not classified, cleaning the data and performing structured storage.

(2) And (3) performing data preprocessing, namely using some common image preprocessing strategies for data such as image data with larger data size, overhigh data dimensionality and improper use of an original image, and performing feature engineering on the data, such as by using an inclusion pre-training convolutional neural network model, so that the data dimensionality is reduced, the features are flattened, and normalization operation is performed to ensure the data quality.

(3) Determining an encoder based on the characteristic size of the sample

Decoder, and recording medium

Decoder group

The specific deep neural network model of (1). The network structure is selected according to different characteristics and pertinence of input sample characteristics, for example, a convolutional neural network is selected for two-dimensional data with local correlation as the sample characteristics, an LSTM (least squares metric) equal-cycle neural network is selected for data with time sequence attributes, and a multilayer perceptron is selected for general data. For universality, a multi-layer perceptron is selected in this embodiment, wherein the number of layers in the middle layer is 3, and the threshold function is designated as the ReLU function. The encoder is identical to the decoder, except for the opposite order of operations.

(4) The training data is rearranged in a random manner to eliminate the bias caused by data non-uniformity. And sending the sampled samples to an encoder for encoding. In decoding, all data points marked as positive samples are fed into the decoder only

Reconstructing, and sending other unmarked samples to decoder group

And (6) carrying out reconstruction. A reconstruction of the data samples is output.

(5) After reconstruction of the sample features is obtained, the following loss functions are optimized:

(6) and selecting a gradient descent algorithm as Adam so as to obtain the gradient of the loss function when the model parameter is used as an independent variable, and adopting a BP algorithm to iterate the parameter.

(7) Repeating the processes from (2) to (6), and stopping training when the loss function is finally stabilized near a certain value; the criterion for determining the incorrectly labeled data is the decoder

Reconstructed error ratio decoder group

The reconstruction error of (2) is large.

Evaluation of the properties of the present invention is given below.

To evaluate the performance of the present invention, we used 2 popular image domain datasets, which are respectively MNIST, USPS. MNIST is a digital picture data set composed of an artificial handwritten material, 10 in total from 0 to 9 in number of categories, representing ten numbers of 0 to 9, respectively. The sample size for training can reach 60000, and the sample size for verifying the effect is 10000. The USPS is another handwritten picture data set independent of the MNIST.

In the experiment, one class of each data set is selected as a positive class, the other classes are selected as negative classes, and the ratio of sampling intensities is about 7: 3.

A large pre-training convolutional neural network is adopted for carrying out preliminary feature dimension reduction extraction on a CIFAR-10 data set, and for other data sets, the original input features are kept unchanged due to moderate dimensions. And adopting Adam algorithm on the optimization algorithm. The algorithmic model is implemented using the pytorech framework.

For performance comparison, we chose to select a large number of algorithms to compare with the present invention, both classical machine learning algorithms and the latest deep learning neural network algorithms. Three indexes of error rate, F1 score and AUC score are adopted to evaluate the performance of the invention and other algorithms.

Tables 1 and 2 show the evaluation index scores of the invention and other methods on 4 typical classes of MNIST data sets, and the data result shows that the Local outer Factor algorithm is slightly higher than the invention in AUC index, and the invention achieves the best effect on all other indexes.

TABLE 1

TABLE 2

Tables 3 and 4 show the evaluation index scores of the present invention and other methods on 4 representative classes of USPS datasets, and the data results show similar effects on MNIST datasets. The performance of the invention is first in all comparison algorithms, and only individual algorithms can obtain performance similar to the invention on individual indexes.

TABLE 3

TABLE 4

Corresponding to the foregoing embodiment of the method for detecting data labeling errors based on multiple decoders, the present invention further provides an embodiment of a device for detecting data labeling errors based on multiple decoders.

Referring to fig. 3, an embodiment of the present invention provides a data labeling error detection apparatus based on multiple decoders, which includes one or more processors, and is configured to implement a data labeling error detection method based on multiple decoders in the foregoing embodiment.

The embodiment of the data labeling error detection device based on the multi-decoder can be applied to any equipment with data processing capability, such as computers and other equipment or devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. From a hardware aspect, as shown in fig. 3, a hardware structure diagram of an arbitrary device with data processing capability where a data labeling error detection apparatus based on multiple decoders is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, in an embodiment, an arbitrary device with data processing capability where an apparatus is located may also include other hardware according to an actual function of the arbitrary device with data processing capability, which is not described again.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.

An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for detecting data annotation errors based on multiple decoders in the foregoing embodiments is implemented.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A data labeling error detection method based on a multi-decoder is characterized by comprising the following steps:

s1, establishing image data sets of the positive samples and the unlabelled samples, and determining the attributes of the unlabelled samples, namely classifying the unlabelled samples into the positive samples and the negative samples;

s2, the positive sample and the unmarked sample pass through an encoder to obtain the hidden layer characteristics of the image data;

s3, enabling the hidden layer characteristics to pass through a positive sample decoder and a negative sample decoder, enabling the positive sample to pass through the positive sample decoder to obtain the reconstruction result of the positive sample, enabling the unmarked sample to pass through the positive sample decoder and the negative sample decoder respectively to obtain the reconstruction result of the unmarked sample under the positive sample decoder and the negative sample decoder respectively, and performing competitive reconstruction on the sample;

2. The method of claim 1, wherein in step S1, the pre-processing of the image data set includes performing dimension reduction, flattening and normalization on the image features.

3. The method of claim 1, wherein in step S1, the image data is subjected to a batch sequence randomization process.

4. The method of claim 1, wherein the positive sample decoder in S3 is a single decoder, the negative sample decoder is a set of negative sample decoders, or the result is fused using an attention mechanism.

5. The method of claim 4, wherein the negative sample decoder is a multi-channel negative sample decoder.

6. The method of claim 1, wherein in step S4, the loss calculation after reconstruction error is calculated by using the following loss function:

wherein,

a set of positive samples is represented, and,

a set of unlabeled samples is represented,

representing the square of the norm of L2.

7. The method of claim 1, wherein in step S4, the gradient of the loss function with the model parameters as arguments is obtained by a gradient descent algorithm, and the parameters are iterated by a back propagation algorithm of forward propagation and error.

8. The method of claim 1, wherein in step S4, the error marked data is represented as a decoder

Has a reconstruction error greater than that of the decoder group

The reconstruction is erroneous.

9. A multi-decoder based data annotation error detection apparatus, comprising one or more processors configured to implement the multi-decoder based data annotation error detection method of any one of claims 1-8.