CN105426826A - Tag noise correction based crowd-sourced tagging data quality improvement method - Google Patents
Tag noise correction based crowd-sourced tagging data quality improvement method Download PDFInfo
- Publication number
- CN105426826A CN105426826A CN201510754782.2A CN201510754782A CN105426826A CN 105426826 A CN105426826 A CN 105426826A CN 201510754782 A CN201510754782 A CN 201510754782A CN 105426826 A CN105426826 A CN 105426826A
- Authority
- CN
- China
- Prior art keywords
- label
- quality
- sample
- data
- integrated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a tag noise correction based crowd-sourced tagging data quality improvement method. The method comprises the following steps: running a tag integration algorithm in an initial crowd-sourced tagging data set to form a data set after tag integration, and estimating tagger quality and integrated tag quality information of samples in the process; performing multi-round K-fold cross validation by utilizing the data set after tag integration, and constructing a high-quality data set; determining a tag noise set in combination with the tagger quality and the tag quality of the samples by utilizing a prediction probability of a class tag of each sample in the multi-round K-fold cross validation process; and training a classification model by utilizing the high-quality data set generated in the multi-round K-fold cross validation process, and performing prediction and replacement on the class tag of each sample in the tag noise data set by using the model. With the tag noise correction method, the quantity of potential noise tag samples in the data set after original tag integration is reduced, so that the data quality is improved.
Description
Technical field
The invention belongs to data label technology field, be specifically related to a kind of mass-rent labeled data increased quality method of correcting based on label noise.
Background technology
Obtain the basic work that high-quality labeled data is the fields such as current information retrieval, machine learning, data mining.For the supervised learning in machine learning, its whole learning process is exactly carry out model training on the data set with class label of a moderate scale, thus obtain can the learning model of Accurate Prediction to not marking sample.Traditionally, the class label in training data is normally provided by the expert of this application.The class label accuracy that expert provides is high, is conducive to building high-quality model.But this expert's mark itself is of a high price.Along with the development of Intelligent Computation Technology, increasing mark demand constantly proposes, and adopts expert's mark can not meet application demand.The appearance of mass-rent system greatly alleviates this problem.A lot of mark task, such as text marking, Images Classification etc., all can be published on internet by mass-rent platform, be marked by the domestic consumer from internet.Domestic consumer completes data mark task and obtains the economic returns that publisher provides.
The appearance of mass-rent mark makes the cost obtaining labeled data diminish and ageing reinforcement.But, mass-rent mark also have its intrinsic defect: mark person is the domestic consumer from internet, compared with mark with traditional expert, its mark quality less than guarantee.In order to solve the problem of poor quality, a kind of method widely adopted allows different mark persons mark with regard to each mark sample, then uses a kind of label integrated approach, obtains the label that each sample is final.The algorithm (RY) that the people such as current existing label Integrated Algorithm comprises: majority voting algorithm, David and Skene algorithm (DS), Raykar propose, ZenCrowd algorithm etc.These label Integrated Algorithms carry out modeling from multiple sides such as the difficulty of the professional knowledge level of user, input degree that user finishes the work, task itself to mass-rent labeling system, and the integrated label of each sample of reasoning.Correlative study finds, although integrated method is varied, does not have certain algorithm to be acknowledged as best performance.In most of the cases, label integrated after the quality of data promote limitation.Here the quality of data is defined as, the matching degree between sample data integrated label value and its label true value.In whole labeled data processing procedure, the label true value of all samples is all unknown, and the integrated target of label is exactly the label correctly inferring each sample, makes it to mate as much as possible with its true value.
The main cause that above-mentioned label Integrated Algorithm cannot promote the quality of data is further the label information that algorithm only make use of from multiple uncertain mark person, and have ignored the characteristic information of data itself.Label value after those are integrated in the present invention and the unmatched data label of label true value are called " noise " label.If can utilize the characteristic information of available data, correct further to these noises, so the quality of data can obtain further lifting.
Summary of the invention
For the above-mentioned technical matters existing for prior art, the invention provides a kind of mass-rent labeled data increased quality method of correcting based on label noise.The general technological system of the method comprises following steps:
(1) at initial mass-rent labeled data collection
dupper operation label Integrated Algorithm, obtain label integrated after data set
d i , each data sample of this data centralization all obtains an integrated label.The estimate mark in the process quality of person and the quality of each integrated sample label.Described mark person quality, the label that namely mark person gives sample is equal to the probability of sample label true value.The quality of described integrated sample label, namely the integrated label of sample is equal to the probability of its label true value.
(2) to data set
d i carry out
mwheel
kfolding cross validation, namely to data set
d i after upsetting at random, be divided into
kpart, wherein every a respectively as test set, and remainder
k+1part, as training set, trains sorter.This sorter is used to carry out Tag Estimation to each sample in test set.In the cross validation that each is taken turns, build a quality data collection.Altogether build
mindividual quality data collection
hQ (1) ,
hQ (2) ...,
hQ (M) .Each is utilized to take turns in cross-validation process, the label classification prediction probability that each sample obtains, the mark person quality obtained in integrating step (1) and integrated sample label quality, the possibility all samples being belonged to label noise sample sorts, determine the label noise sample of some, these composition of sample label noise data collection
d n .From
d i delete those to belong to
d n sample, the clean data set of remaining composition of sample
d c , triadic relation is:
d i =D n + D c .Described
mwith
kfor the parameter of the method, wherein
mvalue be not less than 1 positive integer,
kvalue be not less than 3 positive integer.
(3) utilize described in step (2)
mindividual quality data collection
hQ (1) ,
hQ (2) ...,
hQ (M) train classification models, and utilize this disaggregated model predict noise data set again
d n in the class label of whole samples, and replace original class label with the class label doped, finally form revised noise data collection
d r .
(4) by described in step (3)
d r described in step (2)
d c be merged into new enhancing data set
d e .
d e described in step (1)
d i there is identical sample, but
d e label quality higher than
d i .
Potential in integrated label of the characteristic attribute combination tag noise management technique that the present invention uses mark sample itself
mistakecorrect.The present invention and tradition only carry out having following beneficial effect compared with the integrated method of label:
(1) present invention utilizes the characteristic attribute being marked sample itself revise further on the basis of label integrated approach integrated in potential
mistakelabel, improves the label quality of final data collection.
(2) the present invention is suitable for multiple label integrated approach, has versatility.
The inventive method is all applicable to various types of mass-rent data, includes but not limited to: the two-value mark of the tasks such as image, text, video and many-valued mark.
Accompanying drawing explanation
fig. 1for the general frame of the inventive method
figure.
fig. 2for a kind of embodiment flow process of the inventive method
figure.
Embodiment
In order to more specifically describe the present invention, below in conjunction with
accompanying drawingdescribe a kind of embodiment of the present invention in detail.
Step (1): (mass-rent label is integrated)
(1-1) at initial mass-rent data set
da kind of label Integrated Algorithm of upper operation.The most frequently used algorithm is majority voting algorithm.This algorithm is for each sample of data centralization
i, the label this sample being come to multiple mark person carries out quantity statistics, if classification is
c k label there is maximum quantity, so the integrated label of this sample is
c k .If the label classification that number is maximum is incessantly a kind of, select a kind as the integrated label of this sample so at random.
(1-2) data set
d i in any one sample
i, its integrated label is
, mark person
jgive sample
ilabel be
, so mark person
jmark quality
be calculated as:
Wherein
ibe
d i the number of middle sample, function
for indicator function, namely return 1 when condition is set up otherwise return 0.
Mark person add up to
j, then the average mark quality of all mark persons
be calculated as:
One has
nthe sample of individual mass-rent label
i, its integrated label quality
qbe calculated as:
The bound of the integrated rear potential noise number of estimation label
αwith
βbe calculated as respectively:
Step (2): (noise identification) this step needs two parameters,
kwith
m, wherein
kfor following K rolls over the broken number of cross validation,
mit is the number of the quality data collection that will build.Generally
kbe set to 10,
mbe set to 5.
(2-1) step 2-1 is
mthe cyclic process of wheel, often takes turns circulation
mbuild a high-quality data set
hQ (m) it is as follows that line correlation of going forward side by side calculates concrete steps:
(2-1-1) by data set
d i in sample order upset at random, will
d i be divided into
kequal portions.Once using each equal portions as test set, all the other
k-1individual equal portions are as training set.Use this
k-1individual equal portions data training classifier
m, and use this sorter to predict the sample in test set.
(2-1-2) sorter built
mto each sample
ipredict, dope sample
ibelong to classification 1, classification 2 ..., classification
hprobability be respectively
,
...,
.Calculate
, wherein
hfor classification sum.If this sample
iprediction label and its integrated label obtained in step (1)
difference, then calculate
.Wherein
be used for recording the number of times that the prediction label of each sample is not identical with integrated label in step (1).
be used for describing the uncertainty degree of sample label.If this sample
iprediction label and its integrated label obtained in step (1)
identical, then by sample
iadd
hQ (m) .
(2-2) exist
mafter individual quality data collection builds, right
d i in all sample calculate:
And by all samples according to
descending sort.
(2-3) calculate
the number of sample
θ.Finally, by formula
calculate final selected noise collection
d n number of samples
n r .Press in step (2-6)
before descending sort
n r individual sample is from data set
d i middle deletion, and form noise data collection
d n , remaining
d i in data form clean data collection
d c .
Step 3:(noise is corrected) noise correction procedure is to data set
d n in each sample
iperform following steps:
(3-1) for
mindividual quality data collection
hQ (1) ,
hQ (2) ...,
hQ (M) remove sample wherein
i, build respectively
mindividual classification
l (1) ,
l (2) ...,
l (M) , finally with them to sample
iclass label predict, obtain
mindividual predicted value.
(3-2) to this
mindividual predicted value carries out majority ballot process, namely carries out quantity statistics to each classification, if classification is
c k label there is maximum quantity, the integrated label so revising sample is
c k .If the label classification that number is maximum is incessantly a kind of, select a kind as the final integrated label of this sample so at random.
Step (4): (data merging) is by the data set through above-mentioned steps process
d n and data set
d c carry out merging and form data set
d e ,
d e with
d i have identical sample, but the class label of potential noise sample is revised through said process, data set quality is improved.
In the above-described embodiments, the process building sorter can select suitable sorting algorithm according to data type to be dealt with, such as, can select Bayes classifier for text data, decision tree etc., support vector machine can be selected, neural network etc. for view data.Its building process is Machine learning classifiers and builds standard procedure.
Above-described embodiment is not limitation of the present invention, and the present invention is not limited only to above-described embodiment, as long as meet application claims, all belongs to protection scope of the present invention.
Claims (9)
1., based on the mass-rent labeled data increased quality method that label noise is corrected, comprise the following steps:
(1) run on initial mass-rent labeled data collection label Integrated Algorithm formed label integrated after data set, the each sample of this data centralization obtains an integrated label, the quality of each mark person and the quality of each integrated sample label is estimated in label integrating process or after process, wherein, described mark person quality definition provides the probability of correct label for mark person, and the integrated label quality definition of described sample is the probability that this integrated label equals its true tag;
(2) data set after utilizing label integrated carries out many wheels
kfolding cross validation, and often taking turns
khigh-quality data set is built in the process of folding cross validation, wherein, described in
keach construction method of taking turns middle quality data collection of folding cross validation is: before epicycle cross validation starts, make quality data collection for empty, then in cross-validation process, check that whether the integrated label of each sample of data centralization is consistent with the prediction label of epicycle cross validation to this sample, if unanimously just add in quality data by this sample;
(3) many wheels are utilized
kprediction probability to each sample label classification in folding cross-validation process, in conjunction with the quality of mark person and label, determine label noise sample, and label noise sample is integrated from label after data centralization be separated, form label noise data collection, remaining part forms clean data set;
(4) many wheels are utilized
kthe high-quality data set train classification models that produces in folding cross-validation process, and with this model the label of the sample that label noise data is concentrated predicted again and replace;
(5) the label noise data collection after process and clean data are merged formation quality and strengthen data set; Described
keach construction method of taking turns middle quality data collection of folding cross validation is: before epicycle cross validation starts, make quality data collection for empty, then in cross-validation process, check that whether the integrated label of each sample of data centralization is consistent with the prediction label of epicycle cross validation to this sample, if unanimously just added in quality data by this sample.
2. mass-rent labeled data increased quality method according to claim 1, be included in the label Integrated Algorithm that initial mass-rent labeled data collection runs, it is characterized in that: algorithm at least uses the label of each sample of data centralization that mass-rent mark person gives, estimate the true tag of each sample, this label estimated is called integrated label.
3. mass-rent labeled data increased quality method according to claim 1, comprises the quality estimating each mark person, it is characterized in that: the estimation of mark person quality or directly provided by label Integrated Algorithm or calculated by its result.
4. mass-rent labeled data increased quality method according to claim 1, comprises the quality estimating each integrated sample label, it is characterized in that: the estimation of integrated label quality or directly provided by label Integrated Algorithm or calculated by its result.
5. mass-rent labeled data increased quality method according to claim 1, comprises the step identifying noise exemplar, it is characterized in that:
kdetermine that each sample label belongs to the probability of each classification in folding cross validation, utilize many wheels
kthe sample label predicted in folding cross validation belongs to the probability of each classification, calculate the possible degree that this sample label belongs to noise, and may all samples be sorted degree by this, utilize mark person's quality and integrated label quality, and the number of the possible degree determination noise sample of noise is belonged in conjunction with sample label, according to number and the ordering scenario of noise sample, determine label noise sample.
6. mass-rent labeled data increased quality method according to claim 1, comprise label is integrated after data set be divided into noise data collection and clean data set, it is characterized in that: formed after the data set after clean data set is integrated by label removes label noise sample, the sample of this data centralization, its label no longer changes in subsequent step.
7. mass-rent labeled data increased quality method according to claim 1, comprises and utilizes many wheels
kthe high-quality data set train classification models produced in folding cross-validation process, and with this model the label of the sample that label noise data is concentrated predicted again and replace, it is characterized in that: one or more the disaggregated model training carrying out based on supervised learning utilizing high-quality data centralization, build one or more sorter, independently utilize one of them sorter or combine the prediction again and the replacement that utilize multiple sorter label noise sample to enter label.
8. mass-rent labeled data increased quality method according to claim 1, comprise final formation one and strengthen data set, it is characterized in that: the noise data collection after this data set has correction and the merging of clean data set form, data set after it is integrated with label has identical sample, but its integrated label quality obtains raising.
9., according to the mass-rent labeled data increased quality method described in claim 7, it is characterized in that, the method for described train classification models, different according to the field of handled data set, select the suitable disaggregated model training algorithm based on supervised learning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510754782.2A CN105426826A (en) | 2015-11-09 | 2015-11-09 | Tag noise correction based crowd-sourced tagging data quality improvement method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510754782.2A CN105426826A (en) | 2015-11-09 | 2015-11-09 | Tag noise correction based crowd-sourced tagging data quality improvement method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105426826A true CN105426826A (en) | 2016-03-23 |
Family
ID=55505026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510754782.2A Pending CN105426826A (en) | 2015-11-09 | 2015-11-09 | Tag noise correction based crowd-sourced tagging data quality improvement method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105426826A (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107067105A (en) * | 2017-04-07 | 2017-08-18 | 华东师范大学 | A kind of mass-rent strategy distribution method being grouped based on optimal data |
CN107808661A (en) * | 2017-10-23 | 2018-03-16 | 中央民族大学 | A kind of Tibetan voice corpus labeling method and system based on collaborative batch Active Learning |
CN107844740A (en) * | 2017-09-05 | 2018-03-27 | 中国地质调查局西安地质调查中心 | A kind of offline handwriting, printing Chinese character recognition methods and system |
CN107871196A (en) * | 2016-09-28 | 2018-04-03 | 郑州大学 | A kind of mass-rent method for evaluating quality based on slip task window |
CN108121814A (en) * | 2017-12-28 | 2018-06-05 | 北京百度网讯科技有限公司 | Search results ranking model generating method and device |
CN108197202A (en) * | 2017-12-28 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | Data verification method, device, server and the storage medium of crowdsourcing task |
CN108446695A (en) * | 2018-02-06 | 2018-08-24 | 阿里巴巴集团控股有限公司 | Method, apparatus and electronic equipment for data mark |
CN108509969A (en) * | 2017-09-06 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Data mask method and terminal |
CN108647858A (en) * | 2018-04-12 | 2018-10-12 | 华东师范大学 | A kind of collaboration crowdsourcing method of quality control based on user's inconsistency information |
CN108734296A (en) * | 2017-04-21 | 2018-11-02 | 北京京东尚科信息技术有限公司 | Optimize method, apparatus, electronic equipment and the medium of the training data of supervised learning |
CN108875821A (en) * | 2018-06-08 | 2018-11-23 | Oppo广东移动通信有限公司 | The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing |
CN109086814A (en) * | 2018-07-23 | 2018-12-25 | 腾讯科技(深圳)有限公司 | A kind of data processing method, device and the network equipment |
CN109189767A (en) * | 2018-08-01 | 2019-01-11 | 北京三快在线科技有限公司 | Data processing method, device, electronic equipment and storage medium |
CN109214343A (en) * | 2018-09-14 | 2019-01-15 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating face critical point detection model |
CN109241513A (en) * | 2018-08-27 | 2019-01-18 | 上海宝尊电子商务有限公司 | A kind of method and device based on big data crowdsourcing model data mark |
CN109272003A (en) * | 2017-07-17 | 2019-01-25 | 华东师范大学 | A kind of method and apparatus for eliminating unknown error in deep learning model |
CN109284315A (en) * | 2018-08-24 | 2019-01-29 | 大连莫比嗨客智能科技有限公司 | A kind of label data Statistical Inference under crowdsourcing model |
CN109376260A (en) * | 2018-09-26 | 2019-02-22 | 四川长虹电器股份有限公司 | A kind of method and system of deep learning image labeling |
CN109426834A (en) * | 2017-08-31 | 2019-03-05 | 佳能株式会社 | Information processing unit, information processing method and information processing system |
CN109543693A (en) * | 2018-11-28 | 2019-03-29 | 中国人民解放军国防科技大学 | Weak labeling data noise reduction method based on regularization label propagation |
CN110060247A (en) * | 2019-04-18 | 2019-07-26 | 深圳市深视创新科技有限公司 | Cope with the robust deep neural network learning method of sample marking error |
CN110084290A (en) * | 2019-04-12 | 2019-08-02 | 北京字节跳动网络技术有限公司 | Method, apparatus, electronic equipment and the computer readable storage medium of training classifier |
CN110163376A (en) * | 2018-06-04 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Sample testing method, the recognition methods of media object, device, terminal and medium |
CN110363228A (en) * | 2019-06-26 | 2019-10-22 | 南京理工大学 | Noise label correcting method |
CN110580499A (en) * | 2019-08-20 | 2019-12-17 | 北京邮电大学 | deep learning target detection method and system based on crowdsourcing repeated labels |
CN110705607A (en) * | 2019-09-12 | 2020-01-17 | 西安交通大学 | Industry multi-label noise reduction method based on cyclic re-labeling self-service method |
CN110929807A (en) * | 2019-12-06 | 2020-03-27 | 腾讯科技(深圳)有限公司 | Training method of image classification model, and image classification method and device |
CN111288999A (en) * | 2020-02-19 | 2020-06-16 | 深圳大学 | Pedestrian road network attribute detection method, device and equipment based on mobile terminal |
CN111444937A (en) * | 2020-01-15 | 2020-07-24 | 湖州师范学院 | Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier |
CN111814883A (en) * | 2020-07-10 | 2020-10-23 | 重庆大学 | Label noise correction method based on heterogeneous integration |
CN112000808A (en) * | 2020-09-29 | 2020-11-27 | 迪爱斯信息技术股份有限公司 | Data processing method and device and readable storage medium |
CN112148986A (en) * | 2020-10-09 | 2020-12-29 | 安徽大学 | Crowdsourcing-based top-N service re-recommendation method and system |
CN112988733A (en) * | 2021-04-16 | 2021-06-18 | 北京妙医佳健康科技集团有限公司 | Method and device for improving and enhancing data quality |
CN113139580A (en) * | 2021-03-23 | 2021-07-20 | 杭州电子科技大学 | Crowd-sourced data truth value reasoning method for integrated weighted majority soft voting |
CN113343695A (en) * | 2021-05-27 | 2021-09-03 | 镁佳(北京)科技有限公司 | Text labeling noise detection method and device, storage medium and electronic equipment |
CN113688949A (en) * | 2021-10-25 | 2021-11-23 | 南京码极客科技有限公司 | Network image data set denoising method based on dual-network joint label correction |
CN114611715A (en) * | 2022-05-12 | 2022-06-10 | 之江实验室 | Crowd-sourcing active learning method and device based on annotator reliability time sequence modeling |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324620A (en) * | 2012-03-20 | 2013-09-25 | 北京百度网讯科技有限公司 | Method and device for rectifying marking results |
CN104573359A (en) * | 2014-12-31 | 2015-04-29 | 浙江大学 | Method for integrating crowdsource annotation data based on task difficulty and annotator ability |
CN104915388A (en) * | 2015-03-11 | 2015-09-16 | 浙江大学 | Book tag recommendation method based on spectral clustering and crowdsourcing technology |
-
2015
- 2015-11-09 CN CN201510754782.2A patent/CN105426826A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324620A (en) * | 2012-03-20 | 2013-09-25 | 北京百度网讯科技有限公司 | Method and device for rectifying marking results |
CN104573359A (en) * | 2014-12-31 | 2015-04-29 | 浙江大学 | Method for integrating crowdsource annotation data based on task difficulty and annotator ability |
CN104915388A (en) * | 2015-03-11 | 2015-09-16 | 浙江大学 | Book tag recommendation method based on spectral clustering and crowdsourcing technology |
Non-Patent Citations (1)
Title |
---|
JING ZHANG等: "Improving Label Quality in Crowdsourcing Using Noise Correction", 《CIKM2015》 * |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107871196A (en) * | 2016-09-28 | 2018-04-03 | 郑州大学 | A kind of mass-rent method for evaluating quality based on slip task window |
CN107067105A (en) * | 2017-04-07 | 2017-08-18 | 华东师范大学 | A kind of mass-rent strategy distribution method being grouped based on optimal data |
CN108734296A (en) * | 2017-04-21 | 2018-11-02 | 北京京东尚科信息技术有限公司 | Optimize method, apparatus, electronic equipment and the medium of the training data of supervised learning |
CN109272003A (en) * | 2017-07-17 | 2019-01-25 | 华东师范大学 | A kind of method and apparatus for eliminating unknown error in deep learning model |
US11636378B2 (en) | 2017-08-31 | 2023-04-25 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and information processing system |
CN109426834B (en) * | 2017-08-31 | 2022-05-31 | 佳能株式会社 | Information processing apparatus, information processing method, and information processing system |
CN109426834A (en) * | 2017-08-31 | 2019-03-05 | 佳能株式会社 | Information processing unit, information processing method and information processing system |
CN107844740A (en) * | 2017-09-05 | 2018-03-27 | 中国地质调查局西安地质调查中心 | A kind of offline handwriting, printing Chinese character recognition methods and system |
CN108509969B (en) * | 2017-09-06 | 2021-11-09 | 腾讯科技(深圳)有限公司 | Data labeling method and terminal |
CN108509969A (en) * | 2017-09-06 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Data mask method and terminal |
CN107808661B (en) * | 2017-10-23 | 2020-12-11 | 中央民族大学 | Tibetan language voice corpus labeling method and system based on collaborative batch active learning |
CN107808661A (en) * | 2017-10-23 | 2018-03-16 | 中央民族大学 | A kind of Tibetan voice corpus labeling method and system based on collaborative batch Active Learning |
CN108197202B (en) * | 2017-12-28 | 2021-12-24 | 百度在线网络技术(北京)有限公司 | Data verification method and device for crowdsourcing task, server and storage medium |
CN108121814B (en) * | 2017-12-28 | 2022-04-22 | 北京百度网讯科技有限公司 | Search result ranking model generation method and device |
CN108197202A (en) * | 2017-12-28 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | Data verification method, device, server and the storage medium of crowdsourcing task |
CN108121814A (en) * | 2017-12-28 | 2018-06-05 | 北京百度网讯科技有限公司 | Search results ranking model generating method and device |
CN108446695A (en) * | 2018-02-06 | 2018-08-24 | 阿里巴巴集团控股有限公司 | Method, apparatus and electronic equipment for data mark |
CN108446695B (en) * | 2018-02-06 | 2022-02-11 | 创新先进技术有限公司 | Method and device for data annotation and electronic equipment |
CN108647858A (en) * | 2018-04-12 | 2018-10-12 | 华东师范大学 | A kind of collaboration crowdsourcing method of quality control based on user's inconsistency information |
CN110163376B (en) * | 2018-06-04 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Sample detection method, media object identification method, device, terminal and medium |
CN110163376A (en) * | 2018-06-04 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Sample testing method, the recognition methods of media object, device, terminal and medium |
CN108875821A (en) * | 2018-06-08 | 2018-11-23 | Oppo广东移动通信有限公司 | The training method and device of disaggregated model, mobile terminal, readable storage medium storing program for executing |
US11138478B2 (en) | 2018-06-08 | 2021-10-05 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method and apparatus for training, classification model, mobile terminal, and readable storage medium |
CN109086814A (en) * | 2018-07-23 | 2018-12-25 | 腾讯科技(深圳)有限公司 | A kind of data processing method, device and the network equipment |
CN109086814B (en) * | 2018-07-23 | 2021-05-14 | 腾讯科技(深圳)有限公司 | Data processing method and device and network equipment |
CN109189767A (en) * | 2018-08-01 | 2019-01-11 | 北京三快在线科技有限公司 | Data processing method, device, electronic equipment and storage medium |
CN109189767B (en) * | 2018-08-01 | 2021-07-23 | 北京三快在线科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN109284315A (en) * | 2018-08-24 | 2019-01-29 | 大连莫比嗨客智能科技有限公司 | A kind of label data Statistical Inference under crowdsourcing model |
CN109284315B (en) * | 2018-08-24 | 2021-04-23 | 深圳莫比嗨客树莓派智能机器人有限公司 | Label data statistical inference method in crowdsourcing mode |
CN109241513A (en) * | 2018-08-27 | 2019-01-18 | 上海宝尊电子商务有限公司 | A kind of method and device based on big data crowdsourcing model data mark |
CN109214343A (en) * | 2018-09-14 | 2019-01-15 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating face critical point detection model |
CN109376260B (en) * | 2018-09-26 | 2021-10-01 | 四川长虹电器股份有限公司 | Method and system for deep learning image annotation |
CN109376260A (en) * | 2018-09-26 | 2019-02-22 | 四川长虹电器股份有限公司 | A kind of method and system of deep learning image labeling |
CN109543693A (en) * | 2018-11-28 | 2019-03-29 | 中国人民解放军国防科技大学 | Weak labeling data noise reduction method based on regularization label propagation |
CN109543693B (en) * | 2018-11-28 | 2021-05-07 | 中国人民解放军国防科技大学 | Weak labeling data noise reduction method based on regularization label propagation |
CN110084290A (en) * | 2019-04-12 | 2019-08-02 | 北京字节跳动网络技术有限公司 | Method, apparatus, electronic equipment and the computer readable storage medium of training classifier |
CN110084290B (en) * | 2019-04-12 | 2021-03-05 | 北京字节跳动网络技术有限公司 | Method, apparatus, electronic device and computer-readable storage medium for training classifier |
CN110060247A (en) * | 2019-04-18 | 2019-07-26 | 深圳市深视创新科技有限公司 | Cope with the robust deep neural network learning method of sample marking error |
CN110363228A (en) * | 2019-06-26 | 2019-10-22 | 南京理工大学 | Noise label correcting method |
CN110363228B (en) * | 2019-06-26 | 2022-09-06 | 南京理工大学 | Noise label correction method |
CN110580499B (en) * | 2019-08-20 | 2022-05-24 | 北京邮电大学 | Deep learning target detection method and system based on crowdsourcing repeated labels |
CN110580499A (en) * | 2019-08-20 | 2019-12-17 | 北京邮电大学 | deep learning target detection method and system based on crowdsourcing repeated labels |
CN110705607A (en) * | 2019-09-12 | 2020-01-17 | 西安交通大学 | Industry multi-label noise reduction method based on cyclic re-labeling self-service method |
CN110705607B (en) * | 2019-09-12 | 2022-10-25 | 西安交通大学 | Industry multi-label noise reduction method based on cyclic re-labeling self-service method |
CN110929807A (en) * | 2019-12-06 | 2020-03-27 | 腾讯科技(深圳)有限公司 | Training method of image classification model, and image classification method and device |
CN111444937B (en) * | 2020-01-15 | 2023-05-12 | 湖州师范学院 | Crowd-sourced quality improvement method based on integrated TSK fuzzy classifier |
CN111444937A (en) * | 2020-01-15 | 2020-07-24 | 湖州师范学院 | Crowdsourcing quality improvement method based on integrated TSK fuzzy classifier |
CN111288999B (en) * | 2020-02-19 | 2021-08-31 | 深圳大学 | Pedestrian road network attribute detection method, device and equipment based on mobile terminal |
CN111288999A (en) * | 2020-02-19 | 2020-06-16 | 深圳大学 | Pedestrian road network attribute detection method, device and equipment based on mobile terminal |
CN111814883A (en) * | 2020-07-10 | 2020-10-23 | 重庆大学 | Label noise correction method based on heterogeneous integration |
CN112000808B (en) * | 2020-09-29 | 2024-04-16 | 迪爱斯信息技术股份有限公司 | Data processing method and device and readable storage medium |
CN112000808A (en) * | 2020-09-29 | 2020-11-27 | 迪爱斯信息技术股份有限公司 | Data processing method and device and readable storage medium |
CN112148986B (en) * | 2020-10-09 | 2022-09-30 | 安徽大学 | Top-N service re-recommendation method and system based on crowdsourcing |
CN112148986A (en) * | 2020-10-09 | 2020-12-29 | 安徽大学 | Crowdsourcing-based top-N service re-recommendation method and system |
CN113139580A (en) * | 2021-03-23 | 2021-07-20 | 杭州电子科技大学 | Crowd-sourced data truth value reasoning method for integrated weighted majority soft voting |
CN112988733A (en) * | 2021-04-16 | 2021-06-18 | 北京妙医佳健康科技集团有限公司 | Method and device for improving and enhancing data quality |
CN113343695A (en) * | 2021-05-27 | 2021-09-03 | 镁佳(北京)科技有限公司 | Text labeling noise detection method and device, storage medium and electronic equipment |
CN113688949A (en) * | 2021-10-25 | 2021-11-23 | 南京码极客科技有限公司 | Network image data set denoising method based on dual-network joint label correction |
CN114611715B (en) * | 2022-05-12 | 2022-08-23 | 之江实验室 | Crowd-sourcing active learning method and device based on annotator reliability time sequence modeling |
CN114611715A (en) * | 2022-05-12 | 2022-06-10 | 之江实验室 | Crowd-sourcing active learning method and device based on annotator reliability time sequence modeling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105426826A (en) | Tag noise correction based crowd-sourced tagging data quality improvement method | |
CN108376267B (en) | Zero sample classification method based on class transfer | |
CN108052862B (en) | Age estimation method and device | |
CN109376796A (en) | Image classification method based on active semi-supervised learning | |
CN109086825A (en) | A kind of more disaggregated model fusion methods based on model adaptation selection | |
CN105825233B (en) | A kind of pedestrian detection method based on on-line study random fern classifier | |
CN105095884A (en) | Pedestrian recognition system and pedestrian recognition processing method based on random forest support vector machine | |
CN110363228B (en) | Noise label correction method | |
CN110705607A (en) | Industry multi-label noise reduction method based on cyclic re-labeling self-service method | |
CN110751191A (en) | Image classification method and system | |
WO2020024444A1 (en) | Group performance grade recognition method and apparatus, and storage medium and computer device | |
CN111241987B (en) | Multi-target model visual tracking method based on cost-sensitive three-branch decision | |
CN111861679A (en) | Commodity recommendation method based on artificial intelligence | |
Arbel et al. | Classifier evaluation under limited resources | |
CN110807601A (en) | Park road deterioration analysis method based on truncation data | |
CN107169830B (en) | Personalized recommendation method based on clustering PU matrix decomposition | |
CN111914772B (en) | Age identification method, age identification model training method and device | |
CN110188791A (en) | Based on the visual emotion label distribution forecasting method estimated automatically | |
CN110490053B (en) | Human face attribute identification method based on trinocular camera depth estimation | |
CN106611036A (en) | Improved multidimensional scaling heterogeneous cost-sensitive decision tree building method | |
CN110349119B (en) | Pavement disease detection method and device based on edge detection neural network | |
CN112541010A (en) | User gender prediction method based on logistic regression | |
CN115730152A (en) | Big data processing method and big data processing system based on user portrait analysis | |
CN112508135B (en) | Model training method, pedestrian attribute prediction method, device and equipment | |
CN115511012A (en) | Class soft label recognition training method for maximum entropy constraint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160323 |