CN105426826A - Tag noise correction based crowd-sourced tagging data quality improvement method - Google Patents

Tag noise correction based crowd-sourced tagging data quality improvement method Download PDF

Info

Publication number
CN105426826A
CN105426826A CN201510754782.2A CN201510754782A CN105426826A CN 105426826 A CN105426826 A CN 105426826A CN 201510754782 A CN201510754782 A CN 201510754782A CN 105426826 A CN105426826 A CN 105426826A
Authority
CN
China
Prior art keywords
label
quality
sample
data
integrated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510754782.2A
Other languages
Chinese (zh)
Inventor
张静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201510754782.2A priority Critical patent/CN105426826A/en
Publication of CN105426826A publication Critical patent/CN105426826A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a tag noise correction based crowd-sourced tagging data quality improvement method. The method comprises the following steps: running a tag integration algorithm in an initial crowd-sourced tagging data set to form a data set after tag integration, and estimating tagger quality and integrated tag quality information of samples in the process; performing multi-round K-fold cross validation by utilizing the data set after tag integration, and constructing a high-quality data set; determining a tag noise set in combination with the tagger quality and the tag quality of the samples by utilizing a prediction probability of a class tag of each sample in the multi-round K-fold cross validation process; and training a classification model by utilizing the high-quality data set generated in the multi-round K-fold cross validation process, and performing prediction and replacement on the class tag of each sample in the tag noise data set by using the model. With the tag noise correction method, the quantity of potential noise tag samples in the data set after original tag integration is reduced, so that the data quality is improved.

Description

A kind of mass-rent labeled data increased quality method of correcting based on label noise
Technical field
The invention belongs to data label technology field, be specifically related to a kind of mass-rent labeled data increased quality method of correcting based on label noise.
Background technology
Obtain the basic work that high-quality labeled data is the fields such as current information retrieval, machine learning, data mining.For the supervised learning in machine learning, its whole learning process is exactly carry out model training on the data set with class label of a moderate scale, thus obtain can the learning model of Accurate Prediction to not marking sample.Traditionally, the class label in training data is normally provided by the expert of this application.The class label accuracy that expert provides is high, is conducive to building high-quality model.But this expert's mark itself is of a high price.Along with the development of Intelligent Computation Technology, increasing mark demand constantly proposes, and adopts expert's mark can not meet application demand.The appearance of mass-rent system greatly alleviates this problem.A lot of mark task, such as text marking, Images Classification etc., all can be published on internet by mass-rent platform, be marked by the domestic consumer from internet.Domestic consumer completes data mark task and obtains the economic returns that publisher provides.
The appearance of mass-rent mark makes the cost obtaining labeled data diminish and ageing reinforcement.But, mass-rent mark also have its intrinsic defect: mark person is the domestic consumer from internet, compared with mark with traditional expert, its mark quality less than guarantee.In order to solve the problem of poor quality, a kind of method widely adopted allows different mark persons mark with regard to each mark sample, then uses a kind of label integrated approach, obtains the label that each sample is final.The algorithm (RY) that the people such as current existing label Integrated Algorithm comprises: majority voting algorithm, David and Skene algorithm (DS), Raykar propose, ZenCrowd algorithm etc.These label Integrated Algorithms carry out modeling from multiple sides such as the difficulty of the professional knowledge level of user, input degree that user finishes the work, task itself to mass-rent labeling system, and the integrated label of each sample of reasoning.Correlative study finds, although integrated method is varied, does not have certain algorithm to be acknowledged as best performance.In most of the cases, label integrated after the quality of data promote limitation.Here the quality of data is defined as, the matching degree between sample data integrated label value and its label true value.In whole labeled data processing procedure, the label true value of all samples is all unknown, and the integrated target of label is exactly the label correctly inferring each sample, makes it to mate as much as possible with its true value.
The main cause that above-mentioned label Integrated Algorithm cannot promote the quality of data is further the label information that algorithm only make use of from multiple uncertain mark person, and have ignored the characteristic information of data itself.Label value after those are integrated in the present invention and the unmatched data label of label true value are called " noise " label.If can utilize the characteristic information of available data, correct further to these noises, so the quality of data can obtain further lifting.
Summary of the invention
For the above-mentioned technical matters existing for prior art, the invention provides a kind of mass-rent labeled data increased quality method of correcting based on label noise.The general technological system of the method comprises following steps:
(1) at initial mass-rent labeled data collection dupper operation label Integrated Algorithm, obtain label integrated after data set d i , each data sample of this data centralization all obtains an integrated label.The estimate mark in the process quality of person and the quality of each integrated sample label.Described mark person quality, the label that namely mark person gives sample is equal to the probability of sample label true value.The quality of described integrated sample label, namely the integrated label of sample is equal to the probability of its label true value.
(2) to data set d i carry out mwheel kfolding cross validation, namely to data set d i after upsetting at random, be divided into kpart, wherein every a respectively as test set, and remainder k+1part, as training set, trains sorter.This sorter is used to carry out Tag Estimation to each sample in test set.In the cross validation that each is taken turns, build a quality data collection.Altogether build mindividual quality data collection hQ (1) , hQ (2) ..., hQ (M) .Each is utilized to take turns in cross-validation process, the label classification prediction probability that each sample obtains, the mark person quality obtained in integrating step (1) and integrated sample label quality, the possibility all samples being belonged to label noise sample sorts, determine the label noise sample of some, these composition of sample label noise data collection d n .From d i delete those to belong to d n sample, the clean data set of remaining composition of sample d c , triadic relation is: d i =D n + D c .Described mwith kfor the parameter of the method, wherein mvalue be not less than 1 positive integer, kvalue be not less than 3 positive integer.
(3) utilize described in step (2) mindividual quality data collection hQ (1) , hQ (2) ..., hQ (M) train classification models, and utilize this disaggregated model predict noise data set again d n in the class label of whole samples, and replace original class label with the class label doped, finally form revised noise data collection d r .
(4) by described in step (3) d r described in step (2) d c be merged into new enhancing data set d e . d e described in step (1) d i there is identical sample, but d e label quality higher than d i .
Potential in integrated label of the characteristic attribute combination tag noise management technique that the present invention uses mark sample itself mistakecorrect.The present invention and tradition only carry out having following beneficial effect compared with the integrated method of label:
(1) present invention utilizes the characteristic attribute being marked sample itself revise further on the basis of label integrated approach integrated in potential mistakelabel, improves the label quality of final data collection.
(2) the present invention is suitable for multiple label integrated approach, has versatility.
The inventive method is all applicable to various types of mass-rent data, includes but not limited to: the two-value mark of the tasks such as image, text, video and many-valued mark.
Accompanying drawing explanation
fig. 1for the general frame of the inventive method figure.
fig. 2for a kind of embodiment flow process of the inventive method figure.
Embodiment
In order to more specifically describe the present invention, below in conjunction with accompanying drawingdescribe a kind of embodiment of the present invention in detail.
Step (1): (mass-rent label is integrated)
(1-1) at initial mass-rent data set da kind of label Integrated Algorithm of upper operation.The most frequently used algorithm is majority voting algorithm.This algorithm is for each sample of data centralization i, the label this sample being come to multiple mark person carries out quantity statistics, if classification is c k label there is maximum quantity, so the integrated label of this sample is c k .If the label classification that number is maximum is incessantly a kind of, select a kind as the integrated label of this sample so at random.
(1-2) data set d i in any one sample i, its integrated label is , mark person jgive sample ilabel be , so mark person jmark quality be calculated as:
Wherein ibe d i the number of middle sample, function for indicator function, namely return 1 when condition is set up otherwise return 0.
Mark person add up to j, then the average mark quality of all mark persons be calculated as:
One has nthe sample of individual mass-rent label i, its integrated label quality qbe calculated as:
The bound of the integrated rear potential noise number of estimation label αwith βbe calculated as respectively:
Step (2): (noise identification) this step needs two parameters, kwith m, wherein kfor following K rolls over the broken number of cross validation, mit is the number of the quality data collection that will build.Generally kbe set to 10, mbe set to 5.
(2-1) step 2-1 is mthe cyclic process of wheel, often takes turns circulation mbuild a high-quality data set hQ (m) it is as follows that line correlation of going forward side by side calculates concrete steps:
(2-1-1) by data set d i in sample order upset at random, will d i be divided into kequal portions.Once using each equal portions as test set, all the other k-1individual equal portions are as training set.Use this k-1individual equal portions data training classifier m, and use this sorter to predict the sample in test set.
(2-1-2) sorter built mto each sample ipredict, dope sample ibelong to classification 1, classification 2 ..., classification hprobability be respectively , ..., .Calculate , wherein hfor classification sum.If this sample iprediction label and its integrated label obtained in step (1) difference, then calculate .Wherein be used for recording the number of times that the prediction label of each sample is not identical with integrated label in step (1). be used for describing the uncertainty degree of sample label.If this sample iprediction label and its integrated label obtained in step (1) identical, then by sample iadd hQ (m) .
(2-2) exist mafter individual quality data collection builds, right d i in all sample calculate:
And by all samples according to descending sort.
(2-3) calculate the number of sample θ.Finally, by formula calculate final selected noise collection d n number of samples n r .Press in step (2-6) before descending sort n r individual sample is from data set d i middle deletion, and form noise data collection d n , remaining d i in data form clean data collection d c .
Step 3:(noise is corrected) noise correction procedure is to data set d n in each sample iperform following steps:
(3-1) for mindividual quality data collection hQ (1) , hQ (2) ..., hQ (M) remove sample wherein i, build respectively mindividual classification l (1) , l (2) ..., l (M) , finally with them to sample iclass label predict, obtain mindividual predicted value.
(3-2) to this mindividual predicted value carries out majority ballot process, namely carries out quantity statistics to each classification, if classification is c k label there is maximum quantity, the integrated label so revising sample is c k .If the label classification that number is maximum is incessantly a kind of, select a kind as the final integrated label of this sample so at random.
Step (4): (data merging) is by the data set through above-mentioned steps process d n and data set d c carry out merging and form data set d e , d e with d i have identical sample, but the class label of potential noise sample is revised through said process, data set quality is improved.
In the above-described embodiments, the process building sorter can select suitable sorting algorithm according to data type to be dealt with, such as, can select Bayes classifier for text data, decision tree etc., support vector machine can be selected, neural network etc. for view data.Its building process is Machine learning classifiers and builds standard procedure.
Above-described embodiment is not limitation of the present invention, and the present invention is not limited only to above-described embodiment, as long as meet application claims, all belongs to protection scope of the present invention.

Claims (9)

1., based on the mass-rent labeled data increased quality method that label noise is corrected, comprise the following steps:
(1) run on initial mass-rent labeled data collection label Integrated Algorithm formed label integrated after data set, the each sample of this data centralization obtains an integrated label, the quality of each mark person and the quality of each integrated sample label is estimated in label integrating process or after process, wherein, described mark person quality definition provides the probability of correct label for mark person, and the integrated label quality definition of described sample is the probability that this integrated label equals its true tag;
(2) data set after utilizing label integrated carries out many wheels kfolding cross validation, and often taking turns khigh-quality data set is built in the process of folding cross validation, wherein, described in keach construction method of taking turns middle quality data collection of folding cross validation is: before epicycle cross validation starts, make quality data collection for empty, then in cross-validation process, check that whether the integrated label of each sample of data centralization is consistent with the prediction label of epicycle cross validation to this sample, if unanimously just add in quality data by this sample;
(3) many wheels are utilized kprediction probability to each sample label classification in folding cross-validation process, in conjunction with the quality of mark person and label, determine label noise sample, and label noise sample is integrated from label after data centralization be separated, form label noise data collection, remaining part forms clean data set;
(4) many wheels are utilized kthe high-quality data set train classification models that produces in folding cross-validation process, and with this model the label of the sample that label noise data is concentrated predicted again and replace;
(5) the label noise data collection after process and clean data are merged formation quality and strengthen data set; Described keach construction method of taking turns middle quality data collection of folding cross validation is: before epicycle cross validation starts, make quality data collection for empty, then in cross-validation process, check that whether the integrated label of each sample of data centralization is consistent with the prediction label of epicycle cross validation to this sample, if unanimously just added in quality data by this sample.
2. mass-rent labeled data increased quality method according to claim 1, be included in the label Integrated Algorithm that initial mass-rent labeled data collection runs, it is characterized in that: algorithm at least uses the label of each sample of data centralization that mass-rent mark person gives, estimate the true tag of each sample, this label estimated is called integrated label.
3. mass-rent labeled data increased quality method according to claim 1, comprises the quality estimating each mark person, it is characterized in that: the estimation of mark person quality or directly provided by label Integrated Algorithm or calculated by its result.
4. mass-rent labeled data increased quality method according to claim 1, comprises the quality estimating each integrated sample label, it is characterized in that: the estimation of integrated label quality or directly provided by label Integrated Algorithm or calculated by its result.
5. mass-rent labeled data increased quality method according to claim 1, comprises the step identifying noise exemplar, it is characterized in that: kdetermine that each sample label belongs to the probability of each classification in folding cross validation, utilize many wheels kthe sample label predicted in folding cross validation belongs to the probability of each classification, calculate the possible degree that this sample label belongs to noise, and may all samples be sorted degree by this, utilize mark person's quality and integrated label quality, and the number of the possible degree determination noise sample of noise is belonged in conjunction with sample label, according to number and the ordering scenario of noise sample, determine label noise sample.
6. mass-rent labeled data increased quality method according to claim 1, comprise label is integrated after data set be divided into noise data collection and clean data set, it is characterized in that: formed after the data set after clean data set is integrated by label removes label noise sample, the sample of this data centralization, its label no longer changes in subsequent step.
7. mass-rent labeled data increased quality method according to claim 1, comprises and utilizes many wheels kthe high-quality data set train classification models produced in folding cross-validation process, and with this model the label of the sample that label noise data is concentrated predicted again and replace, it is characterized in that: one or more the disaggregated model training carrying out based on supervised learning utilizing high-quality data centralization, build one or more sorter, independently utilize one of them sorter or combine the prediction again and the replacement that utilize multiple sorter label noise sample to enter label.
8. mass-rent labeled data increased quality method according to claim 1, comprise final formation one and strengthen data set, it is characterized in that: the noise data collection after this data set has correction and the merging of clean data set form, data set after it is integrated with label has identical sample, but its integrated label quality obtains raising.
9., according to the mass-rent labeled data increased quality method described in claim 7, it is characterized in that, the method for described train classification models, different according to the field of handled data set, select the suitable disaggregated model training algorithm based on supervised learning.
CN201510754782.2A 2015-11-09 2015-11-09 Tag noise correction based crowd-sourced tagging data quality improvement method Pending CN105426826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510754782.2A CN105426826A (en) 2015-11-09 2015-11-09 Tag noise correction based crowd-sourced tagging data quality improvement method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510754782.2A CN105426826A (en) 2015-11-09 2015-11-09 Tag noise correction based crowd-sourced tagging data quality improvement method

Publications (1)

Publication Number Publication Date
CN105426826A true CN105426826A (en) 2016-03-23

Family

ID=55505026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510754782.2A Pending CN105426826A (en) 2015-11-09 2015-11-09 Tag noise correction based crowd-sourced tagging data quality improvement method

Country Status (1)

Country Link
CN (1) CN105426826A (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067105A (en) * 2017-04-07 2017-08-18 华东师范大学 A kind of mass-rent strategy distribution method being grouped based on optimal data
CN107808661A (en) * 2017-10-23 2018-03-16 中央民族大学 A kind of Tibetan voice corpus labeling method and system based on collaborative batch Active Learning
CN107844740A (en) * 2017-09-05 2018-03-27 中国地质调查局西安地质调查中心 A kind of offline handwriting, printing Chinese character recognition methods and system
CN107871196A (en) * 2016-09-28 2018-04-03 郑州大学 A kind of mass-rent method for evaluating quality based on slip task window
CN108121814A (en) * 2017-12-28 2018-06-05 北京百度网讯科技有限公司 Search results ranking model generating method and device
CN108197202A (en) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 Data verification method, device, server and the storage medium of crowdsourcing task
CN108446695A (en) * 2018-02-06 2018-08-24 阿里巴巴集团控股有限公司 Method, apparatus and electronic equipment for data mark
CN108509969A (en) * 2017-09-06 2018-09-07 腾讯科技(深圳)有限公司 Data mask method and terminal
CN108647858A (en) * 2018-04-12 2018-10-12 华东师范大学 A kind of collaboration crowdsourcing method of quality control based on user's inconsistency information
CN108734296A (en) * 2017-04-21 2018-11-02 北京京东尚科信息技术有限公司 Optimize method, apparatus, electronic equipment and the medium of the training data of supervised learning
CN108875821A (en) * 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing
CN109086814A (en) * 2018-07-23 2018-12-25 腾讯科技(深圳)有限公司 A kind of data processing method, device and the network equipment
CN109189767A (en) * 2018-08-01 2019-01-11 北京三快在线科技有限公司 Data processing method, device, electronic equipment and storage medium
CN109214343A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating face critical point detection model
CN109241513A (en) * 2018-08-27 2019-01-18 上海宝尊电子商务有限公司 A kind of method and device based on big data crowdsourcing model data mark
CN109272003A (en) * 2017-07-17 2019-01-25 华东师范大学 A kind of method and apparatus for eliminating unknown error in deep learning model
CN109284315A (en) * 2018-08-24 2019-01-29 大连莫比嗨客智能科技有限公司 A kind of label data Statistical Inference under crowdsourcing model
CN109376260A (en) * 2018-09-26 2019-02-22 四川长虹电器股份有限公司 A kind of method and system of deep learning image labeling
CN109426834A (en) * 2017-08-31 2019-03-05 佳能株式会社 Information processing unit, information processing method and information processing system
CN109543693A (en) * 2018-11-28 2019-03-29 中国人民解放军国防科技大学 Weak labeling data noise reduction method based on regularization label propagation
CN110060247A (en) * 2019-04-18 2019-07-26 深圳市深视创新科技有限公司 Cope with the robust deep neural network learning method of sample marking error
CN110084290A (en) * 2019-04-12 2019-08-02 北京字节跳动网络技术有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of training classifier
CN110163376A (en) * 2018-06-04 2019-08-23 腾讯科技(深圳)有限公司 Sample testing method, the recognition methods of media object, device, terminal and medium
CN110363228A (en) * 2019-06-26 2019-10-22 南京理工大学 Noise label correcting method
CN110580499A (en) * 2019-08-20 2019-12-17 北京邮电大学 deep learning target detection method and system based on crowdsourcing repeated labels
CN110705607A (en) * 2019-09-12 2020-01-17 西安交通大学 Industry multi-label noise reduction method based on cyclic re-labeling self-service method
CN110929807A (en) * 2019-12-06 2020-03-27 腾讯科技(深圳)有限公司 Training method of image classification model, and image classification method and device
CN111288999A (en) * 2020-02-19 2020-06-16 深圳大学 Pedestrian road network attribute detection method, device and equipment based on mobile terminal
CN111444937A (en) * 2020-01-15 2020-07-24 湖州师范学院 Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier
CN111814883A (en) * 2020-07-10 2020-10-23 重庆大学 Label noise correction method based on heterogeneous integration
CN112000808A (en) * 2020-09-29 2020-11-27 迪爱斯信息技术股份有限公司 Data processing method and device and readable storage medium
CN112148986A (en) * 2020-10-09 2020-12-29 安徽大学 Crowdsourcing-based top-N service re-recommendation method and system
CN112988733A (en) * 2021-04-16 2021-06-18 北京妙医佳健康科技集团有限公司 Method and device for improving and enhancing data quality
CN113139580A (en) * 2021-03-23 2021-07-20 杭州电子科技大学 Crowd-sourced data truth value reasoning method for integrated weighted majority soft voting
CN113343695A (en) * 2021-05-27 2021-09-03 镁佳(北京)科技有限公司 Text labeling noise detection method and device, storage medium and electronic equipment
CN113688949A (en) * 2021-10-25 2021-11-23 南京码极客科技有限公司 Network image data set denoising method based on dual-network joint label correction
CN114611715A (en) * 2022-05-12 2022-06-10 之江实验室 Crowd-sourcing active learning method and device based on annotator reliability time sequence modeling

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
CN104573359A (en) * 2014-12-31 2015-04-29 浙江大学 Method for integrating crowdsource annotation data based on task difficulty and annotator ability
CN104915388A (en) * 2015-03-11 2015-09-16 浙江大学 Book tag recommendation method based on spectral clustering and crowdsourcing technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
CN104573359A (en) * 2014-12-31 2015-04-29 浙江大学 Method for integrating crowdsource annotation data based on task difficulty and annotator ability
CN104915388A (en) * 2015-03-11 2015-09-16 浙江大学 Book tag recommendation method based on spectral clustering and crowdsourcing technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JING ZHANG等: "Improving Label Quality in Crowdsourcing Using Noise Correction", 《CIKM2015》 *

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871196A (en) * 2016-09-28 2018-04-03 郑州大学 A kind of mass-rent method for evaluating quality based on slip task window
CN107067105A (en) * 2017-04-07 2017-08-18 华东师范大学 A kind of mass-rent strategy distribution method being grouped based on optimal data
CN108734296A (en) * 2017-04-21 2018-11-02 北京京东尚科信息技术有限公司 Optimize method, apparatus, electronic equipment and the medium of the training data of supervised learning
CN109272003A (en) * 2017-07-17 2019-01-25 华东师范大学 A kind of method and apparatus for eliminating unknown error in deep learning model
US11636378B2 (en) 2017-08-31 2023-04-25 Canon Kabushiki Kaisha Information processing apparatus, information processing method, and information processing system
CN109426834B (en) * 2017-08-31 2022-05-31 佳能株式会社 Information processing apparatus, information processing method, and information processing system
CN109426834A (en) * 2017-08-31 2019-03-05 佳能株式会社 Information processing unit, information processing method and information processing system
CN107844740A (en) * 2017-09-05 2018-03-27 中国地质调查局西安地质调查中心 A kind of offline handwriting, printing Chinese character recognition methods and system
CN108509969B (en) * 2017-09-06 2021-11-09 腾讯科技(深圳)有限公司 Data labeling method and terminal
CN108509969A (en) * 2017-09-06 2018-09-07 腾讯科技(深圳)有限公司 Data mask method and terminal
CN107808661B (en) * 2017-10-23 2020-12-11 中央民族大学 Tibetan language voice corpus labeling method and system based on collaborative batch active learning
CN107808661A (en) * 2017-10-23 2018-03-16 中央民族大学 A kind of Tibetan voice corpus labeling method and system based on collaborative batch Active Learning
CN108197202B (en) * 2017-12-28 2021-12-24 百度在线网络技术(北京)有限公司 Data verification method and device for crowdsourcing task, server and storage medium
CN108121814B (en) * 2017-12-28 2022-04-22 北京百度网讯科技有限公司 Search result ranking model generation method and device
CN108197202A (en) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 Data verification method, device, server and the storage medium of crowdsourcing task
CN108121814A (en) * 2017-12-28 2018-06-05 北京百度网讯科技有限公司 Search results ranking model generating method and device
CN108446695A (en) * 2018-02-06 2018-08-24 阿里巴巴集团控股有限公司 Method, apparatus and electronic equipment for data mark
CN108446695B (en) * 2018-02-06 2022-02-11 创新先进技术有限公司 Method and device for data annotation and electronic equipment
CN108647858A (en) * 2018-04-12 2018-10-12 华东师范大学 A kind of collaboration crowdsourcing method of quality control based on user's inconsistency information
CN110163376B (en) * 2018-06-04 2023-11-03 腾讯科技(深圳)有限公司 Sample detection method, media object identification method, device, terminal and medium
CN110163376A (en) * 2018-06-04 2019-08-23 腾讯科技(深圳)有限公司 Sample testing method, the recognition methods of media object, device, terminal and medium
CN108875821A (en) * 2018-06-08 2018-11-23 Oppo广东移动通信有限公司 The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing
US11138478B2 (en) 2018-06-08 2021-10-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for training, classification model, mobile terminal, and readable storage medium
CN109086814A (en) * 2018-07-23 2018-12-25 腾讯科技(深圳)有限公司 A kind of data processing method, device and the network equipment
CN109086814B (en) * 2018-07-23 2021-05-14 腾讯科技(深圳)有限公司 Data processing method and device and network equipment
CN109189767A (en) * 2018-08-01 2019-01-11 北京三快在线科技有限公司 Data processing method, device, electronic equipment and storage medium
CN109189767B (en) * 2018-08-01 2021-07-23 北京三快在线科技有限公司 Data processing method and device, electronic equipment and storage medium
CN109284315A (en) * 2018-08-24 2019-01-29 大连莫比嗨客智能科技有限公司 A kind of label data Statistical Inference under crowdsourcing model
CN109284315B (en) * 2018-08-24 2021-04-23 深圳莫比嗨客树莓派智能机器人有限公司 Label data statistical inference method in crowdsourcing mode
CN109241513A (en) * 2018-08-27 2019-01-18 上海宝尊电子商务有限公司 A kind of method and device based on big data crowdsourcing model data mark
CN109214343A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating face critical point detection model
CN109376260B (en) * 2018-09-26 2021-10-01 四川长虹电器股份有限公司 Method and system for deep learning image annotation
CN109376260A (en) * 2018-09-26 2019-02-22 四川长虹电器股份有限公司 A kind of method and system of deep learning image labeling
CN109543693A (en) * 2018-11-28 2019-03-29 中国人民解放军国防科技大学 Weak labeling data noise reduction method based on regularization label propagation
CN109543693B (en) * 2018-11-28 2021-05-07 中国人民解放军国防科技大学 Weak labeling data noise reduction method based on regularization label propagation
CN110084290A (en) * 2019-04-12 2019-08-02 北京字节跳动网络技术有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of training classifier
CN110084290B (en) * 2019-04-12 2021-03-05 北京字节跳动网络技术有限公司 Method, apparatus, electronic device and computer-readable storage medium for training classifier
CN110060247A (en) * 2019-04-18 2019-07-26 深圳市深视创新科技有限公司 Cope with the robust deep neural network learning method of sample marking error
CN110363228A (en) * 2019-06-26 2019-10-22 南京理工大学 Noise label correcting method
CN110363228B (en) * 2019-06-26 2022-09-06 南京理工大学 Noise label correction method
CN110580499B (en) * 2019-08-20 2022-05-24 北京邮电大学 Deep learning target detection method and system based on crowdsourcing repeated labels
CN110580499A (en) * 2019-08-20 2019-12-17 北京邮电大学 deep learning target detection method and system based on crowdsourcing repeated labels
CN110705607A (en) * 2019-09-12 2020-01-17 西安交通大学 Industry multi-label noise reduction method based on cyclic re-labeling self-service method
CN110705607B (en) * 2019-09-12 2022-10-25 西安交通大学 Industry multi-label noise reduction method based on cyclic re-labeling self-service method
CN110929807A (en) * 2019-12-06 2020-03-27 腾讯科技(深圳)有限公司 Training method of image classification model, and image classification method and device
CN111444937B (en) * 2020-01-15 2023-05-12 湖州师范学院 Crowd-sourced quality improvement method based on integrated TSK fuzzy classifier
CN111444937A (en) * 2020-01-15 2020-07-24 湖州师范学院 Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier
CN111288999B (en) * 2020-02-19 2021-08-31 深圳大学 Pedestrian road network attribute detection method, device and equipment based on mobile terminal
CN111288999A (en) * 2020-02-19 2020-06-16 深圳大学 Pedestrian road network attribute detection method, device and equipment based on mobile terminal
CN111814883A (en) * 2020-07-10 2020-10-23 重庆大学 Label noise correction method based on heterogeneous integration
CN112000808B (en) * 2020-09-29 2024-04-16 迪爱斯信息技术股份有限公司 Data processing method and device and readable storage medium
CN112000808A (en) * 2020-09-29 2020-11-27 迪爱斯信息技术股份有限公司 Data processing method and device and readable storage medium
CN112148986B (en) * 2020-10-09 2022-09-30 安徽大学 Top-N service re-recommendation method and system based on crowdsourcing
CN112148986A (en) * 2020-10-09 2020-12-29 安徽大学 Crowdsourcing-based top-N service re-recommendation method and system
CN113139580A (en) * 2021-03-23 2021-07-20 杭州电子科技大学 Crowd-sourced data truth value reasoning method for integrated weighted majority soft voting
CN112988733A (en) * 2021-04-16 2021-06-18 北京妙医佳健康科技集团有限公司 Method and device for improving and enhancing data quality
CN113343695A (en) * 2021-05-27 2021-09-03 镁佳(北京)科技有限公司 Text labeling noise detection method and device, storage medium and electronic equipment
CN113688949A (en) * 2021-10-25 2021-11-23 南京码极客科技有限公司 Network image data set denoising method based on dual-network joint label correction
CN114611715B (en) * 2022-05-12 2022-08-23 之江实验室 Crowd-sourcing active learning method and device based on annotator reliability time sequence modeling
CN114611715A (en) * 2022-05-12 2022-06-10 之江实验室 Crowd-sourcing active learning method and device based on annotator reliability time sequence modeling

Similar Documents

Publication Publication Date Title
CN105426826A (en) Tag noise correction based crowd-sourced tagging data quality improvement method
CN108376267B (en) Zero sample classification method based on class transfer
CN108052862B (en) Age estimation method and device
CN109376796A (en) Image classification method based on active semi-supervised learning
CN109086825A (en) A kind of more disaggregated model fusion methods based on model adaptation selection
CN105825233B (en) A kind of pedestrian detection method based on on-line study random fern classifier
CN105095884A (en) Pedestrian recognition system and pedestrian recognition processing method based on random forest support vector machine
CN110363228B (en) Noise label correction method
CN110705607A (en) Industry multi-label noise reduction method based on cyclic re-labeling self-service method
CN110751191A (en) Image classification method and system
WO2020024444A1 (en) Group performance grade recognition method and apparatus, and storage medium and computer device
CN111241987B (en) Multi-target model visual tracking method based on cost-sensitive three-branch decision
CN111861679A (en) Commodity recommendation method based on artificial intelligence
Arbel et al. Classifier evaluation under limited resources
CN110807601A (en) Park road deterioration analysis method based on truncation data
CN107169830B (en) Personalized recommendation method based on clustering PU matrix decomposition
CN111914772B (en) Age identification method, age identification model training method and device
CN110188791A (en) Based on the visual emotion label distribution forecasting method estimated automatically
CN110490053B (en) Human face attribute identification method based on trinocular camera depth estimation
CN106611036A (en) Improved multidimensional scaling heterogeneous cost-sensitive decision tree building method
CN110349119B (en) Pavement disease detection method and device based on edge detection neural network
CN112541010A (en) User gender prediction method based on logistic regression
CN115730152A (en) Big data processing method and big data processing system based on user portrait analysis
CN112508135B (en) Model training method, pedestrian attribute prediction method, device and equipment
CN115511012A (en) Class soft label recognition training method for maximum entropy constraint

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160323