CN112766337A - Method and system for predicting correct label of crowdsourced data - Google Patents
Method and system for predicting correct label of crowdsourced data Download PDFInfo
- Publication number
- CN112766337A CN112766337A CN202110028695.4A CN202110028695A CN112766337A CN 112766337 A CN112766337 A CN 112766337A CN 202110028695 A CN202110028695 A CN 202110028695A CN 112766337 A CN112766337 A CN 112766337A
- Authority
- CN
- China
- Prior art keywords
- data
- label
- crowdsourcing
- neural network
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 127
- 238000003062 neural network model Methods 0.000 claims abstract description 57
- 238000012549 training Methods 0.000 claims abstract description 52
- 230000006870 function Effects 0.000 claims description 25
- 238000005070 sampling Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 10
- 230000007423 decrease Effects 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 abstract description 16
- 238000013136 deep learning model Methods 0.000 abstract description 11
- 230000008569 process Effects 0.000 description 18
- 238000002474 experimental method Methods 0.000 description 13
- 238000004088 simulation Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000012358 sourcing Methods 0.000 description 7
- 238000010276 construction Methods 0.000 description 6
- 238000012552 review Methods 0.000 description 6
- 238000009826 distribution Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method and a system for predicting the correct label of crowdsourcing data, wherein the method utilizes a neural network model, and the model obtains the reference label of the corresponding crowdsourcing data based on the mean value of all initial labels of each crowdsourcing data and obtains the reference label by training; and obtaining a prediction label of each crowdsourcing data by using the neural network model, and iteratively calibrating the current neural network model based on the credibility of each initial label of each crowdsourcing data relative to the prediction label until the neural network model converges or the precision continuously declines. The method and the system can reduce the dependence on the ability of crowdsourcing data workers in deep learning, thereby improving the accuracy and the robustness of a deep learning model.
Description
Technical Field
The invention relates to the technical field of data mining analysis, in particular to a method and a system for predicting correct labels of crowdsourced data.
Background
In recent years, the advanced technical level of each branch of machine learning is remarkably improved by deep learning, and the machine learning field is revolutionized. With the continuous increase of the scale of the supervised artificial neural network, the demand of the deep learning technology for an accurate and labeled data set in the process of learning feature representation is also increasing. The crowdsourcing method can acquire a large amount of tagged data in a short time by distributing the tag task to different workers, reduces the tag cost in a large scale, is a fast, effective and cheap data tag acquisition method, and is widely applied to large-scale data tags. However, the crowdsourcing method introduces a large number of non-expert workers, and due to factors such as sample difficulty and worker capability, different degrees of noise exist in the data tag.
In response to the above problems, many scholars and researchers have conducted related studies. For example, chinese patent application CN201711113706.9 discloses a method for obtaining a crowdsourcing cost complexity, in which a task allocation module allocates crowdsourcing tasks to a group of workers selected from a worker pool, the task allocation module performs task allocation to obtain probability distribution of the ability of the workers participating in task processing and variance and expectation of the worker distribution, a parameter learning model performs parameter learning to obtain worker parameters, a result aggregation module obtains task results, and a crowdsourcing process cost complexity is obtained according to the worker parameters. Chinese patent application CN201510958745.3 discloses a crowdsourcing annotation integration method, which utilizes a regularization super-parameter, a spacing distance super-parameter, an annotator voting weight, and the difference between the times of annotating a current prediction item with a corresponding estimation value by an annotator and the times of annotating the current prediction item with a secondary category by the annotator to define a generalized inverse Gaussian distribution, samples to obtain an auxiliary parameter, and utilizes the auxiliary parameter to update the weight of the annotator, thereby obviously enhancing the discrimination capability of a model. Then, a traditional labeling integration majority voting model and a confusion matrix model are integrated, and the purpose of more comprehensively describing the data generation process is further achieved. Chinese patent application CN201910770300.0 discloses a deep learning target detection method and system based on crowdsourcing repeated labels, firstly receiving an original training set picture in an application scene, and collecting data labels; then, preprocessing the original training set picture to obtain preprocessed data; and then training a crowdR-CNN target detection model by using the preprocessed data, and adding a label aggregation layer according to the data labels on the basis of the two-stage model, so that the real type of the target is inferred according to the individual sensitivity of the annotator, and a prediction result is obtained through a crowdR-CNN network according to the detection data. In addition, a crowd-sourcing method based on deep learning is provided, wherein a crowd-sourcing layer (crowd layer) is added behind an output layer, and the crowd-sourcing layer simulates the capability of crowd-sourcing data workers to achieve the purpose of converting a real label and a crowd-sourcing label, so that crowd-sourcing data can be processed end to end.
However, existing deep learning crowd-sourcing methods typically rely on worker competency to make inferences, and sample label inference with worker competency often results in undesirable end results due to inaccuracies in worker competency determination for specific sample data, because worker competency is difficult to estimate.
Therefore, a need exists for a method and system for predicting the correct label of crowd sourced data.
Disclosure of Invention
Therefore, an object of the embodiments of the present invention is to overcome the above-mentioned drawbacks of the prior art, and provide a method and a system for predicting a correct label of crowdsourcing data, so as to reduce the dependency on the ability of a worker in the crowdsourcing data, thereby improving the accuracy and robustness of a deep learning model.
The above purpose is realized by the following technical scheme:
according to a first aspect of the embodiments of the present invention, there is provided a model training method for predicting correct labels of crowd-sourced data, including: obtaining a crowdsourcing data set, wherein each crowdsourcing data in the crowdsourcing data set has a plurality of initial labels; acquiring a reference label of corresponding crowdsourcing data based on the average value of all initial labels of each crowdsourcing data so as to train a neural network model; obtaining a predictive label for each of the crowd-sourced data using the neural network model, and calibrating the neural network model based on a confidence level of each initial label for each of the crowd-sourced data relative to the predictive label; until the neural network model converges or the accuracy continues to decline.
In one embodiment, said calibrating said neural network model based on the trustworthiness of each initial tag of said each crowdsourcing data relative to said predicted tag comprises: taking the credibility of each initial label of each crowdsourcing data relative to the prediction label as a sampling weight of each initial label of each crowdsourcing data; and carrying out weighted sampling on each crowdsourcing data and the initial label corresponding to the sampling weight according to the sampling weight, and retraining the neural network model.
In one embodiment, further comprising: and normalizing each crowdsourcing data and the reference label and each initial label thereof by using the mean value and the standard deviation of the reference label of each crowdsourcing data in the crowdsourcing data set so as to train a neural network model.
In one embodiment, the weighted sampling of each crowdsourcing data and the initial label corresponding to the sampling weight according to the sampling weight, and the retraining the neural network model comprises: equivalently, the loss function of the neural network model becomes:
wherein the content of the first and second substances,to normalize the confidence of the jth initial tag of the ith crowd-sourced data after processing,a prediction tag for the ith crowd-sourced data predicted for the neural network model,for the ith piece of crowd-sourced data,the ith initial label of the ith crowdsourcing data after normalization processing is obtained.
In one embodiment, the confidence of each initial label of each crowdsourced data relative to the predicted label is obtained through a gaussian kernel function, and the formula is as follows:
wherein the content of the first and second substances,a prediction tag for the ith crowd-sourced data predicted for the neural network model, the j initial label of the ith crowdsourced data after normalization processing, e is a natural constant, sigma2Is a preset fixed parameter.
Another aspect of the present invention provides a method for predicting a correct label of crowd-sourced data, comprising: obtaining a crowdsourcing data set, each crowdsourcing data in the crowdsourcing data set having a number of initial labels; and obtaining a prediction label of each crowdsourcing data in the crowdsourcing data set by using the neural network model obtained by any one of the training methods, and taking the prediction label as a correct label of each corresponding crowdsourcing data. .
Another aspect of the present invention provides a system for predicting correct labels for crowd sourced data, comprising: the system comprises an interface module, a data processing module and a data processing module, wherein the interface module is used for acquiring a crowdsourcing data set, the crowdsourcing data set comprises a training data set and a testing data set, and each crowdsourcing data in the crowdsourcing data set is provided with a plurality of initial labels; the training module is used for acquiring a reference label of corresponding training data based on the average value of all initial labels of each training data in the training data set so as to train a neural network model; a calibration module, configured to obtain a prediction label of each training data by using the neural network model, and calibrate the neural network model based on a reliability of each initial label of each training data with respect to the prediction label until the neural network model converges or the accuracy continues to decrease; and the prediction module is used for obtaining a prediction label of each test data in the test data set by using the calibrated neural network model, and taking the prediction label as a correct label of each corresponding crowdsourcing data.
Another aspect of the invention provides a storage medium in which a computer program is stored which, when being executed by a processor, is operable to carry out the method of any one of the preceding claims.
Another aspect of the invention provides an electronic device comprising a processor and a memory, the memory having stored therein a computer program operable to, when executed by the processor, implement the method of any one of the above.
The technical scheme of the invention can comprise the following beneficial effects:
the method can avoid the condition that the accuracy of the deep learning model is low due to low accuracy of estimation of the worker capability caused by the conditions that the sample data amount of an individual worker is small, the accuracy of the worker to different sample data labels is inconsistent and the like, effectively reduces the degree of dependence of the deep learning method using crowdsourcing data on the worker capability, can obtain the more robust deep learning model under the conditions of more workers, less labels of the individual workers and the like, and can keep relatively high accuracy under the actual crowdsourcing environment.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 illustrates a flow diagram of a model training method for predicting correct labels for crowd-sourced data, in accordance with one embodiment of the invention;
FIG. 2 illustrates a block diagram of a system for predicting the correct label of crowd-sourced data, according to one embodiment of the invention;
FIG. 3 shows a diagram of a function data true tag constructed in an experiment and an initial tag given by a worker, according to one embodiment of the invention;
FIG. 4 illustrates an in-experiment simulation dataset deep learning model, according to one embodiment of the invention;
FIG. 5 illustrates a MovieReviews dataset deep learning model in an experiment according to one embodiment of the invention;
FIG. 6 shows R in an experiment according to one embodiment of the present invention2Graph with period variation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In order to solve the above problems in the prior art, the present invention provides a method and a system for predicting a correct label of crowdsourcing data, wherein the method employs a neural network model for predicting a correct label of crowdsourcing data, so as to effectively solve the problem that the deep learning crowdsourcing method relies heavily on the ability of workers to infer accuracy.
FIG. 1 shows a flow diagram of a model training method for predicting correct labels for crowd-sourced data, in accordance with an embodiment of the invention. As shown in fig. 1, the model training method comprises two stages: an initial model training phase and a model calibration phase. In the initial model training phase (i.e., steps S110-S120), a crowdsourcing platform is used to obtain a crowdsourcing data set, and a plurality of initial labels of each crowdsourcing data are aggregated, a reference label of the corresponding crowdsourcing data is obtained based on an average value of all the initial labels of each crowdsourcing data, and then the initial model is trained using the crowdsourcing data and the corresponding reference label. In the model calibration stage (i.e., steps S130-S150), a parameter estimation strategy of an Expectation-Maximization algorithm (EM) is adopted, and through step E, a prediction label of each crowdsourced data is obtained by using the current model; through the M step, the current model is calibrated based on the credibility of each initial label of each crowdsourcing data relative to the predicted label, and iteration is repeated until convergence or the situation that the model precision is continuously reduced occurs, so that a more robust model is obtained. The method comprises the following specific steps:
step S110, a crowdsourcing data set is obtained.
Crowdsourcing data sets applied to different learning tasks may be obtained from existing crowdsourcing platforms. Each crowdsourced data in the crowdsourced data set has a number of initial labels from different workers, and the number of initial labels of each crowdsourced data can be different or can be partially the same.
Step S120, obtaining a reference label of the corresponding crowdsourcing data based on a mean value of all initial labels of each crowdsourcing data, so as to train the neural network model.
In one embodiment, the neural network model may be trained by averaging all initial labels of each crowdsourcing data in the crowdsourcing data set, using the average as a corresponding crowdsourcing data slave reference label, and then using each crowdsourcing data in the crowdsourcing data set as an input and using a corresponding reference label of each crowdsourcing data as an output. For example, the ith crowdsourced data x in a crowdsourced data setiIs the set of all initial tags ofCrowd-sourced data x to the ithiAll initial labels of (1) average miAs the ith crowdsourcing data xiThe formula is as follows:
wherein i represents the ith crowdsourcing data, j represents the jth initial label, yijJ initial label, n, representing i crowdsourced dataiThe number of all initial tags representing the ith crowd-sourced data.
In one embodiment, a mean value of all initial labels of each crowdsourcing data in the crowdsourcing data set may be normalized, and the normalized mean value is used as a reference label of the corresponding crowdsourcing data to train the neural network model. In one embodiment, each crowdsourced data and its reference label and each initial label may be normalized by the mean and standard deviation of the reference labels of all crowdsourced data in the crowdsourced data set, and the formula is as follows:
wherein m isiA reference tag representing the ith crowd-sourced data,mean, y, of reference labels representing all crowdsourced data in a crowdsourced data setstdA standard deviation of reference labels representing all crowdsourced data in a crowdsourced data set.
Crowdsourcing data and its reference tags using normalizationFor the initial modelTraining to obtain a modelThe modelThe loss function of (d) can be expressed as:
wherein N is the number of all crowdsourced data in the crowdsourced data set,to use modelsObtaining the predictive label of the ith crowdsourcing data after normalization processing,to normalize the processed ith crowdsourced data,and a reference label representing the ith crowdsourcing data after the normalization processing.
Step S130, obtaining a prediction label of each crowdsourcing data by using the neural network model.
As described above, the present invention employs an EM-like framework parameter estimation strategy during the model calibration phase. The EM is a type of optimization algorithm for carrying out maximum likelihood estimation through iteration and is used for carrying out parameter estimation on a probability model containing hidden variables or missing data. The standard computational framework of the EM algorithm consists of alternating E and M steps, and the convergence of the algorithm ensures that the iteration approaches at least local maxima.
This step is similar to the E step in the EM algorithm, i.e., the predictive label of each crowdsourced data in the crowdsourced data set is obtained using the current neural network model.
Step S140, calibrating the current neural network model based on the credibility of each initial label of each crowdsourcing data relative to the predicted label.
In the initial model training process, each initial label of each crowd-sourced data has equal weight, but the credibility of the initial label given by each worker in the crowd-sourced data set is different due to different abilities of the worker. In this regard, in one embodiment, the reliability of each initial label of each crowdsourcing data relative to a predicted label predicted by the model is used as a sampling weight of each initial label of each crowdsourcing data, each crowdsourcing data and the initial label corresponding to the sampling weight are sampled in a weighted manner according to the sampling weight, and the neural network model is retrained.
According to one embodiment of the invention, n may be copied for the ith crowdsourced data in a crowdsourced data setiIs obtained byCorresponding to an initial set of tagsWherein n isiThe number of all initial tags representing the ith crowd-sourced data. Spreading the ith crowdsourcing data and each initial label to obtainThen using the mean valueAnd standard deviation ystdCrowd-sourced data of ith and initial label thereofNormalization processing is carried out to obtain normalized ith crowdsourcing data and initial labels thereofThe normalized ith crowdsourcing dataJ initial tag ofPredictive tagging of the crowd-sourced data predicted with the current modelComparing to obtain the ith crowdsourcing dataJ initial tag ofReliability of (D) is recorded asAnd then constructing the ith crowdsourcing data and the initial label thereof according to the credibilityThe sampling weight of each initial label.
In an embodiment, the reliability of each initial label of each piece of data may be normalized, and each piece of crowd-sourced data and the initial label corresponding to the sampling weight may be weighted and sampled according to the normalized reliability as an application weight, so as to retrain the model, where a formula for normalizing the reliability of each initial label of each piece of data is as follows:
wherein q isijThe trustworthiness of the jth initial tag for the ith crowd-sourced data.
The credibility of each initial label of each crowdsourcing data is used as the sampling weight of each crowdsourcing data, each crowdsourcing data and the initial label corresponding to the sampling weight are subjected to weighted sampling, and a batch of training data can be obtained for retraining the current modelThe weighted sampling training process may be understood as the model matchingIs changed by the loss function of:
wherein the content of the first and second substances,to normalize the confidence of the jth initial tag of the ith crowd-sourced data after processing,to obtain the predictive label of the normalized ith crowdsourcing data using the current neural network model,to normalize the processed ith crowdsourced data,the ith initial label of the ith crowdsourcing data after normalization processing is obtained.
In one embodiment, the confidence of each initial tag relative to the predicted tag for each crowd-sourced data may be obtained by a gaussian kernel function. For example, crowd-sourced data at the ith of a participating tagGiven the relatively large number of workers involved, it is assumed that each worker involved in the tagging is intelligent and that the initial tagging they give follows a normal distributionWherein the content of the first and second substances,representing the ith crowd-sourced data obtained with the current modelThe prediction tag of (a) is determined,indicating crowdsourcing data at the ithThe variance of the upper label can be understood as label fluctuation caused by difficulty of crowd-sourced data, and the difficulty of each crowd-sourced data is assumed to be the same, namelyThen an initial label is generatedThe probability of (c) is:
wherein the content of the first and second substances,to normalize the processed predicted label of the ith crowd-sourced data obtained using the current model,the j initial label of the ith crowdsourced data after normalization processing, e is a natural constant, sigma2Is the variance of the label on the crowd-sourced data.
The initial label may be generated based on the aboveProbability of obtaining an initial labelConfidence of, i.e. fix σ2Defining a Gaussian kernel function as an initial tag credibility calculation method, wherein the formula is as follows:
wherein the content of the first and second substances,to normalize the processed predicted label of the ith crowd-sourced data obtained using the current model,the j initial label of the ith crowdsourced data after normalization processing, e is a natural constant, sigma2Is a preset fixed parameter.
And step S150, repeating the steps S130-S140 until the neural network model converges or the precision continuously decreases.
And repeatedly using the calibrated neural network model to obtain the prediction label of each crowdsourcing data, and retraining the current neural network model based on the credibility of each initial label of each crowdsourcing data relative to the current prediction label until the neural network model converges or the precision continuously decreases.
Based on crowdsourced data and corresponding tagsObtaining corresponding likelihood functions The method of obtaining the parameter estimate is the maximum likelihood method. Thus, the above steps S130-S150 are similar to the M steps in the EM algorithm.
In the embodiment, the calibration updating of the model is realized by utilizing the parameter estimation strategy of the EM-like framework, so that a more robust deep learning model is obtained on the basis of the existing model. However, since the EM framework itself depends on the initial value to some extent, and deep learning has a certain instability in the training process (for example, jitter due to an inappropriate learning rate or jitter of the learning process due to an inappropriate sample data weight at present, etc.), the above strategy similar to the EM framework parameter estimation may have a certain instability, which makes it different from the real EM framework. However, deep learning has a certain tolerance to errors, and it can be seen that, under a relatively well initialized deep model condition (the accuracy of the initial model trained by the label mean in the actual condition is within an acceptable range), the labels of crowd-sourced data can be relatively well predicted, so that a relatively good crowd-sourced data initial label reliability estimation can be obtained, and finally a more robust and accurate deep model is obtained, which is a benign loop process. Therefore, when the final model result converges or the accuracy of several continuous iteration processes decreases due to instability, the iteration process is ended, and the model before convergence or accuracy decrease is used as the model of the correct label of the final prediction crowd-sourced data.
By the embodiment, the problem of low accuracy of worker capability estimation caused by the conditions that the amount of sample data of an individual worker label is small, the accuracy of the worker label to different sample data is inconsistent and the like in crowdsourcing data can be effectively solved, and the accuracy and the robustness of the deep learning model are improved.
In one embodiment, a method of predicting correct labels for crowd sourced data is provided, comprising: obtaining a crowdsourcing data set, wherein each crowdsourcing data in the crowdsourcing data set is provided with a plurality of initial labels, then obtaining a prediction label of each crowdsourcing data in the crowdsourcing data set by using the neural network model trained by the training method, and taking the prediction label as a correct label of the corresponding crowdsourcing data.
In one embodiment, a system for predicting correct labels for crowd sourced data is also provided. Fig. 2 is a schematic structural diagram of a system for predicting correct labels of crowd-sourced data according to one embodiment of the invention. As shown in fig. 2, the system 200 includes an interface module 201, a training module 202, a calibration module 203, and a prediction module 204. Although the block diagrams depict components in a functionally separate manner, such depiction is for illustrative purposes only. The components shown in the figures may be arbitrarily combined or separated into separate software, firmware, and/or hardware components. Moreover, regardless of how such components are combined or divided, they may execute on the same computing device or multiple computing devices, which may be connected by one or more networks.
The interface module 201 is configured to obtain a crowdsourcing data set, where the crowdsourcing data set includes a training data set and a testing data set, and each crowdsourcing data in the crowdsourcing data set has a plurality of initial tags. The training module 202 is configured to obtain a reference label of the corresponding training data based on an average of all initial labels of each training data in the training data set, so as to train the neural network model. The calibration module 203 is configured to obtain a predicted label of each training data by using the neural network model, and calibrate the neural network model based on the reliability of each initial label of each training data with respect to the predicted label until the neural network model converges or the accuracy continues to decrease. The prediction module 204 is configured to obtain a prediction label of each test data in the test data set by using the calibrated neural network model, and use the prediction label as a correct label of the corresponding crowdsourcing data.
In another embodiment of the present invention, a computer-readable storage medium is further provided, on which a computer program or executable instructions are stored, and when the computer program or the executable instructions are executed, the technical solution as described in the foregoing embodiments is implemented, and the implementation principle thereof is similar, and is not described herein again. In embodiments of the present invention, the computer readable storage medium may be any tangible medium that can store data and that can be read by a computing device. Examples of computer readable storage media include hard disk drives, Network Attached Storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-R, CD-RWs, magnetic tapes, and other optical or non-optical data storage devices. The computer readable storage medium may also include computer readable media distributed over a network coupled computer system so that computer programs or instructions may be stored and executed in a distributed fashion.
Experimental part
To further verify the validity of the method and system for predicting the correct label of crowdsourced data proposed by the present invention, the inventors also performed experiments on the simulation dataset and the real dataset [ Rodrigues, fits, Pereira, francisco. The simulation dataset construction process will be described below; the real data set is a public data set MovieReviews [ downloaded from: http:// fprodrigues. com// deep _ movie reviews. tar. gz ], the public data set contained 5006 movie reviews, with the goal of scoring movies according to movie reviews, with scores between 1-10, with AMT platform, data set publishers collected scores of 1500 reviews from 137 workers, with 4.96 answers per review on average, and the remaining 3506 movie reviews were used as tests.
1) Simulation dataset construction
In the simulation data, we construct three sets of functions, y ═ x3,x∈[-2,2],y=5*sin x,x∈[-8,8],The probability distribution parameters of each worker generating tags in different intervals are different for constructing 5 workers for the first function and 4 workers for the last two functions. For example, the first worker for the first function is at [ -2, -1 []According to N (y +5, 10), generating sample data label y1In [ -1,2 ]]Generates its own sample label y according to N (y +1, 0.5)1. The specific constructed data set and the generated worker tags are shown in FIG. 3.
2) Deep learning model construction
For the simulation data set, because the function is relatively simple, a relatively simple deep learning model is selected, the model comprises a 4-layer network, namely a first input layer and a three-layer full-connection layer, and the specific model is shown in fig. 4. The network has MSE as the cost function.
For a simulated data set, the model construction is relatively simple, due to the simple function fitting problem. Firstly, mapping an input real number to a full-connection layer of 50 neurons, then mapping an output 50-dimensional real vector of the layer as an input to the full-connection layer of 50 neurons, and finally connecting to an output layer to correspond to a real output of a function. The concrete model is shown in fig. 4. The network has MSE as the cost function.
For the real data set MovieReviews, the model construction is relatively slightly more complex due to the language processing problem. Firstly, mapping each word to an integer, then unifying each comment to the same length, complementing 0 for parts with insufficient length, then mapping numbers corresponding to each word to word vectors through an Embedding layer to serve as initial features, in order to save time, adopting fixed parameters for mapping numbers corresponding to words in a model to word vectors, selecting parameters in glove300B as an Embedding matrix, and then automatically learning and extracting feature features by a deep network, wherein the part comprises 3 × 3 convolutional layers containing 128 features, 5 × 5 pooling layers, 5 × 5 convolutional layers containing 128 features, 5 × 5 pooling layers containing 5 × 5 neural units, and a full connection layer containing 32 neural units, and finally connecting and outputting. The concrete model is shown in fig. 5. The network has MSE as the cost function.
3) Comparison method
In order to compare the effects of a deep learning crowdsourcing method (denoted as method four) based on label credibility with the existing methods, the invention uses 1 basic method and two latest deep learning crowdsourcing methods as comparison methods, which respectively comprise:
algorithm using mean as sample data label (write method one)
Deep learning crowdsourcing method corrected by Crowdlayer (+ B) (note as method two)
Deep learning crowdsourcing method (denoted method three) with mean initialization and correction using Crowdlayer (+ B)
4) Evaluation index
Corr: pearson product-moment correlation coefficient, which is used to measure the correlation (linear correlation) between two variables X and Y, has a value between-1 and 1. Generally in the regression problem, the larger the value the better.
Mae: the average absolute error is an average value of absolute errors, and can well reflect the actual situation of predicted value errors. Generally, the smaller the value, the better.
RMSE: root mean square error, measure the deviation between observed and true values. Is often used as a measure of the prediction outcome of the machine learning model. Generally, the smaller the value, the better.
·R2The R side is generally used for describing the good and bad fitting degree of data to the model, the maximum value is 1, and the larger the value is, the better the fitting is generally.
5) Analysis of Experimental results
Simulation data set experimental result analysis
Because the simulation data set is relatively simple and the comparison difference of the results is obvious, the experiment is only carried out once, and the specific results are shown in the table 1 and the table 2.
|
Corr | Mae | RMSE | R2 |
Method 1 | 0.986 | 2.186 | 2.246 | 0.450 |
Method two | 0.985 | 2.144 | 2.214 | 0.465 |
Method III | 0.988 | 2.129 | 2.185 | 0.479 |
Method IV | 0.999 | 0.981 | 0.997 | 0.892 |
|
Corr | Mae | RMSE | R2 |
Method 1 | 0.960 | 2.478 | 2.690 | 0.432 |
Method two | 0.950 | 2.694 | 2.938 | 0.322 |
Method III | 0.929 | 2.952 | 3.298 | 0.146 |
Method IV | 0.989 | 0.797 | 0.960 | 0.928 |
Function 3 | Corr | Mae | RMSE | R2 |
Method 1 | 0.969 | 2.456 | 2.699 | 0.617 |
Method two | 0.976 | 2.637 | 2.809 | 0.585 |
Method III | 0.975 | 2.890 | 3.029 | 0.517 |
Method IV | 0.996 | 1.261 | 1.319 | 0.908 |
Table 1 simulation data set experimental results
Table 2 simulation data set experiment result image
As can be seen from tables 1 and 2, of the three experiments, the method four, i.e., the deep learning crowdsourcing method based on label credibility in the present invention, performs best, and the experimental effect is much better than the method through mean learning and through Crowdlayer correction. In experiment 1, methods two and three performed slightly better than method one, while in experiments 2, 3, methods two and three performed even worse than method one. In consideration of the construction process, it is easy to find that each worker has inconsistent performance capabilities in different intervals in the data set constructed by the three functions, and in the current crowdsourcing method, when the worker capabilities are considered, each worker generally considers that the judgment capabilities of all samples are the same or only considers simple combination of sample data self difficulty and worker comprehensive capabilities, which results in wrong judgment of the worker capabilities of the sample data, so that the crowdsourcing learning result is not even better than that of a model obtained by a mean value label. In the method, under the condition that the ability of a worker is difficult to estimate, the reliability of the label is directly considered, and the initial model trained by the mean label is directly corrected to obtain a more robust and accurate prediction model
Analysis of MovieReviews data set Experimental results
Because the real data set MoiveReviews is a natural language processing data set and is labeled by a real person, various conditions can occur on the label, such as less label amount of a single worker, different labeling capacities of different workers on the same sample data, different labeling capacities of the same worker on different sample data, and the like, so that the training result is unstable due to more wrong labels in real training, 10 experiments are performed on the data set using method one, method three and method four, the overall result is observed and compared, and meanwhile, in order to observe the defects of the existing method in detail, the result of each period of the method three and the method four is displayed. Specific results are shown in table 3 and fig. 5.
From table 3, we can see that there is a certain fluctuation in all the three methods in ten experiments, while relatively speaking, the fluctuation of method three is the largest and the fluctuation of method four is the most stable. In addition, in this regression problem, method four performed best, R of method four2Is superior to the first method and the third method. Although the third method uses the mean label for initialization, the pre-training period is short because the function of Crowdlayer needs to be enhanced, and a single deviation cannot represent the capability of a worker, so the third method has inaccurate estimation capability, and general effects, and in addition, the third method cannot conveniently and quickly estimate the accuracy of the model on the verification set in each iteration due to the addition of the Crowdlayer, and is difficult to control the training process, such as the change of the learning rate and the like. The fourth method is trained on the basis of the first method, so that the result of the first method influences the result of the fourth method to a certain extent, and the fourth method can be initialized well when the result of the first method is good, so that the reliability of the judgment label is relatively accurate, and the calibrated model is more robust and accurate. As can be seen from FIG. 6, method four is R in the training process2The change is regular, when the last model parameter estimation is relatively good, the label reliability estimation is relatively accurate, so that the credible label obtains a larger sampling weight, and finally the accuracy of the calibrated model is better, and because certain jitter may occur in the training process, the estimation of the training parameter at a certain time is not good, so that the reliability estimation is relatively inaccurate, the incredible or bad label weight is increased, the accuracy of the model is continuously poor, and the iteration process needs to be ended in advance. And R from method III2In this variation, we can see that R of method III2Jitter is large and instability is severe. Therefore, according to the deep learning crowdsourcing method based on the label credibility, the capability of a worker is not estimated under many conditions, the label of the sample data is directly observed, and the obtained deep learning model is more robust and accurate.
Number of |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Average |
Method one | 0.396 | 0.348 | 0.365 | 0.423 | 0.359 | 0.331 | 0.430 | 0.359 | 0.346 | 0.417 | 0.377 |
Method III | 0.406 | 0.416 | 0.388 | 0.386 | 0.321 | 0.357 | 0.239 | 0.425 | 0.277 | 0.404 | 0.362 |
Method IV | 0.420 | 0.372 | 0.391 | 0.437 | 0.345 | 0.364 | 0.444 | 0.416 | 0.395 | 0.434 | 0.402 |
TABLE 3 results of ten experiments with MovieReviews (R)2)
Reference in the specification to "various embodiments," "some embodiments," "one embodiment," or "an embodiment," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in various embodiments," "in some embodiments," "in one embodiment," or "in an embodiment," or the like, in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Thus, a particular feature, structure, or characteristic illustrated or described in connection with one embodiment may be combined, in whole or in part, with a feature, structure, or characteristic of one or more other embodiments without limitation, as long as the combination is not logical or operational.
The terms "comprises," "comprising," and "having," and similar referents in this specification, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The word "a" or "an" does not exclude a plurality. Additionally, the various elements of the drawings of the present application are merely schematic illustrations and are not drawn to scale.
Although the present invention has been described by the above embodiments, the present invention is not limited to the embodiments described herein, and various changes and modifications may be made without departing from the scope of the present invention.
Claims (9)
1. A model training method for predicting correct labels for crowd sourced data, comprising:
1) obtaining a crowdsourcing data set, wherein each crowdsourcing data in the crowdsourcing data set has a plurality of initial labels;
2) acquiring a reference label of corresponding crowdsourcing data based on the average value of all initial labels of each crowdsourcing data so as to train a neural network model;
3) obtaining a predictive label for each of the crowd-sourced data using the neural network model, and calibrating the neural network model based on a confidence level of each initial label for each of the crowd-sourced data relative to the predictive label;
4) and repeating the step 3) until the neural network model converges or the precision continuously decreases.
2. The training method of claim 1, wherein said calibrating the neural network model based on the confidence of each initial label of the each crowdsourcing data relative to the predicted label comprises:
taking the credibility of each initial label of each crowdsourcing data relative to the prediction label as a sampling weight of each initial label of each crowdsourcing data;
and carrying out weighted sampling on each crowdsourcing data and the initial label corresponding to the sampling weight according to the sampling weight, and retraining the neural network model.
3. The training method of claim 2, wherein step 2) further comprises:
and normalizing each crowdsourcing data and the reference label and each initial label thereof by using the mean value and the standard deviation of the reference labels of all crowdsourcing data in the crowdsourcing data set so as to train a neural network model.
4. The training method of claim 3, wherein the weighted sampling of the each crowdsourced data and the initial label corresponding to the sampling weight according to the sampling weight, and the retraining of the neural network model comprises:
weighted sampling is equivalent to changing the loss function of the neural network model to:
wherein the content of the first and second substances,to normalize the confidence of the jth initial tag of the ith crowd-sourced data after processing,a prediction label for normalized ith crowd-sourced data predicted by the neural network model,to normalize the processed ith crowdsourced data,the ith initial label of the ith crowdsourcing data after normalization processing is obtained.
5. The training method of claim 3, wherein the confidence of each initial label of each crowdsourced data relative to the predicted label is obtained by a Gaussian kernel function, and the formula is as follows:
6. A method for predicting correct labeling of crowd sourced data, comprising:
obtaining a crowdsourcing data set, each crowdsourcing data in the crowdsourcing data set having a number of initial labels;
obtaining a predictive label for each crowdsourcing data in the crowdsourcing data set using a neural network model obtained by the training method of any one of claims 1-5, and using the predictive label as a correct label for each corresponding crowdsourcing data.
7. A system for predicting correct labeling of crowd sourced data, comprising:
the system comprises an interface module, a data processing module and a data processing module, wherein the interface module is used for acquiring a crowdsourcing data set, the crowdsourcing data set comprises a training data set and a testing data set, and each crowdsourcing data in the crowdsourcing data set is provided with a plurality of initial labels;
the training module is used for acquiring a reference label of corresponding training data based on the average value of all initial labels of each training data in the training data set so as to train a neural network model;
a calibration module, configured to obtain a label of each training data by using the neural network model, and calibrate the neural network model based on a reliability of each initial label of each training data with respect to the predicted label until the neural network model converges or the accuracy continues to decrease;
and the prediction module is used for obtaining a prediction label of each test data in the test data set by using the calibrated neural network model, and taking the prediction label as a correct label of each corresponding crowdsourcing data.
8. A storage medium in which a computer program is stored which, when being executed by a processor, is operative to carry out the method of any one of claims 1-6.
9. An electronic device comprising a processor and a memory, the memory having stored therein a computer program which, when executed by the processor, is operable to carry out the method of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110028695.4A CN112766337B (en) | 2021-01-11 | 2021-01-11 | Method and system for predicting correct tags for crowd-sourced data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110028695.4A CN112766337B (en) | 2021-01-11 | 2021-01-11 | Method and system for predicting correct tags for crowd-sourced data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112766337A true CN112766337A (en) | 2021-05-07 |
CN112766337B CN112766337B (en) | 2024-01-12 |
Family
ID=75701195
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110028695.4A Active CN112766337B (en) | 2021-01-11 | 2021-01-11 | Method and system for predicting correct tags for crowd-sourced data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112766337B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113419868A (en) * | 2021-08-23 | 2021-09-21 | 南方科技大学 | Temperature prediction method, device, equipment and storage medium based on crowdsourcing |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108898218A (en) * | 2018-05-24 | 2018-11-27 | 阿里巴巴集团控股有限公司 | A kind of training method of neural network model, device and computer equipment |
CN109543756A (en) * | 2018-11-26 | 2019-03-29 | 重庆邮电大学 | A kind of tag queries based on Active Learning and change method |
CN110070183A (en) * | 2019-03-11 | 2019-07-30 | 中国科学院信息工程研究所 | A kind of the neural network model training method and device of weak labeled data |
CN110580499A (en) * | 2019-08-20 | 2019-12-17 | 北京邮电大学 | deep learning target detection method and system based on crowdsourcing repeated labels |
CN110929807A (en) * | 2019-12-06 | 2020-03-27 | 腾讯科技(深圳)有限公司 | Training method of image classification model, and image classification method and device |
CN111275079A (en) * | 2020-01-13 | 2020-06-12 | 浙江大学 | Crowdsourcing label speculation method and system based on graph neural network |
-
2021
- 2021-01-11 CN CN202110028695.4A patent/CN112766337B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108898218A (en) * | 2018-05-24 | 2018-11-27 | 阿里巴巴集团控股有限公司 | A kind of training method of neural network model, device and computer equipment |
CN109543756A (en) * | 2018-11-26 | 2019-03-29 | 重庆邮电大学 | A kind of tag queries based on Active Learning and change method |
CN110070183A (en) * | 2019-03-11 | 2019-07-30 | 中国科学院信息工程研究所 | A kind of the neural network model training method and device of weak labeled data |
CN110580499A (en) * | 2019-08-20 | 2019-12-17 | 北京邮电大学 | deep learning target detection method and system based on crowdsourcing repeated labels |
CN110929807A (en) * | 2019-12-06 | 2020-03-27 | 腾讯科技(深圳)有限公司 | Training method of image classification model, and image classification method and device |
CN111275079A (en) * | 2020-01-13 | 2020-06-12 | 浙江大学 | Crowdsourcing label speculation method and system based on graph neural network |
Non-Patent Citations (7)
Title |
---|
GUOWEI XU, 等: "Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study", 《ARXIV:1908.00086V1》 * |
MING WU 等: "Learning deep networks with crowdsourcing for relevance evaluation", 《EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING》, pages 1 * |
RYAN DRAPEAU: "MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy", 《PROCEEDINGS, THE FOURTH AAAI CONFERENCE ON HUMAN COMPUTATION AND CROWDSOURCING》 * |
RYAN DRAPEAU: "MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy", 《PROCEEDINGS, THE FOURTH AAAI CONFERENCE ON HUMAN COMPUTATION AND CROWDSOURCING》, 31 December 2016 (2016-12-31) * |
WEI WANG 等: "Obtaining High-Quality Label by Distinguishing between Easy and Hard Items in Crowdsourcing", 《PROCEEDINGS OF THE 26TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-17)》 * |
WEI WANG 等: "Obtaining High-Quality Label by Distinguishing between Easy and Hard Items in Crowdsourcing", 《PROCEEDINGS OF THE 26TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-17)》, 19 August 2017 (2017-08-19) * |
李易南等: "面向众包数据的特征扩维标签质量提高方法", 《智能系统学报》, no. 02 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113419868A (en) * | 2021-08-23 | 2021-09-21 | 南方科技大学 | Temperature prediction method, device, equipment and storage medium based on crowdsourcing |
CN113419868B (en) * | 2021-08-23 | 2021-11-16 | 南方科技大学 | Temperature prediction method, device, equipment and storage medium based on crowdsourcing |
WO2023024213A1 (en) * | 2021-08-23 | 2023-03-02 | 南方科技大学 | Crowdsourcing-based temperature prediction method and apparatus, and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112766337B (en) | 2024-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11341424B2 (en) | Method, apparatus and system for estimating causality among observed variables | |
CN114169442B (en) | Remote sensing image small sample scene classification method based on double prototype network | |
CN110210540B (en) | Cross-social media user identity recognition method and system based on attention mechanism | |
CN111027576A (en) | Cooperative significance detection method based on cooperative significance generation type countermeasure network | |
CN115080749B (en) | Weak supervision text classification method, system and device based on self-supervision training | |
CN113065525A (en) | Age recognition model training method, face age recognition method and related device | |
CN112766337B (en) | Method and system for predicting correct tags for crowd-sourced data | |
Shen et al. | Nonlinear structural equation models for network topology inference | |
CN113822144A (en) | Target detection method and device, computer equipment and storage medium | |
CN110458867B (en) | Target tracking method based on attention circulation network | |
CN111161238A (en) | Image quality evaluation method and device, electronic device, and storage medium | |
CN114821337B (en) | Semi-supervised SAR image building area extraction method based on phase consistency pseudo tag | |
CN113724325B (en) | Multi-scene monocular camera pose regression method based on graph convolution network | |
CN114970732A (en) | Posterior calibration method and device for classification model, computer equipment and medium | |
CN114386527A (en) | Category regularization method and system for domain adaptive target detection | |
CN114611621A (en) | Cooperative clustering method based on attention hypergraph neural network | |
CN114078203A (en) | Image recognition method and system based on improved PATE | |
CN113872703A (en) | Method and system for predicting multi-network metadata in quantum communication network | |
CN111898598A (en) | Target detection method based on text in dynamic scene | |
Alfelt | Closed-form estimator for the matrix-variate Gamma distribution | |
CN112131446B (en) | Graph node classification method and device, electronic equipment and storage medium | |
CN112651505B (en) | Truth value discovery method and system for knowledge verification | |
CN114511023B (en) | Classification model training method and classification method | |
CN115471717B (en) | Semi-supervised training and classifying method device, equipment, medium and product of model | |
CN116910682B (en) | Event detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |