CN112396279B - Robust crowdsourcing data analysis method based on trust model - Google Patents

Robust crowdsourcing data analysis method based on trust model Download PDF

Info

Publication number
CN112396279B
CN112396279B CN202010551752.2A CN202010551752A CN112396279B CN 112396279 B CN112396279 B CN 112396279B CN 202010551752 A CN202010551752 A CN 202010551752A CN 112396279 B CN112396279 B CN 112396279B
Authority
CN
China
Prior art keywords
crowdsourcing
accuracy
task
workers
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010551752.2A
Other languages
Chinese (zh)
Other versions
CN112396279A (en
Inventor
孙杰
焦玉全
吴礼发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010551752.2A priority Critical patent/CN112396279B/en
Publication of CN112396279A publication Critical patent/CN112396279A/en
Application granted granted Critical
Publication of CN112396279B publication Critical patent/CN112396279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a robust crowdsourcing data analysis method based on a trust model, which analyzes historical credibility information of crowdsourcing workers by using beta distribution, then analyzes data of task results of part crowdsourcing workers in current tasks by using a voting consistency rule, and finally predicts accuracy information of result data provided by the crowdsourcing workers at this time by using a Bayes algorithm.

Description

Robust crowdsourcing data analysis method based on trust model
Technical Field
The invention relates to the field of recommendation algorithms, in particular to a robust crowdsourcing data analysis method based on a trust model.
Background
Some existing recommendation algorithms can analyze crowdsourcing result data provided by crowdsourcing workers (parties who complete crowdsourcing tasks) according to the requirements of employers, and screen out high-quality result data to recommend the high-quality result data to users. Therefore, once the working state of the crowdsourcing worker changes in the task, the precision of the evaluation result of the historical behavior data of the worker and the precision of the result data submitted by the worker in the task are greatly deviated, and the actual accuracy of the result data submitted by the crowdsourcing worker in the task is lower than that estimated in advance. Therefore, how to enable the employer to obtain crowdsourcing result data with high accuracy under any condition becomes a difficulty in current crowdsourcing data quality research.
The problem faced by high-quality crowdsourcing data screening at present is a difficult problem which is urgently needed to be solved in the crowdsourcing field. The present invention can solve the above problems well.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a robust crowdsourcing data analysis method based on a trust model, which can more efficiently and accurately screen crowdsourcing data.
The technical scheme is as follows: the invention relates to a robust crowdsourcing data analysis method based on a trust model, which comprises the following steps:
s1: extracting historical credit degree information of crowdsourcing workers according to the basic information of the crowdsourcing workers provided by the crowdsourcing platform;
s2: analyzing the historical credit degree information provided by the S1 by using beta distribution to obtain the pre-test distribution of the work precision of crowdsourcing workers;
s3: randomly selecting task results of part crowdsourcing workers after the task issued by the employer is completed according to the data set;
s4: analyzing part of crowdsourcing task result data provided by the S3 by using a voting consistency rule to obtain a conditional probability of crowdsourcing data accuracy;
s5: and (4) performing combined calculation on the prior information obtained in the step (S2) and the conditional probability obtained in the step (S4) through a Bayesian model to obtain the posterior accuracy information of the data result submitted by crowdsourcing at this time.
Preferably, the historical reputation information of the crowdsourcing workers extracted in S1 takes accuracy information of k task results over a continuous period of time where the crowdsourcing workers are most stable as the historical reputation information.
Preferably, the analyzing historical reputation information by beta distribution in S2 specifically includes the following steps:
s2.1-precision of results a provided by the worker indicates the accuracy of the task completed by the worker, a' S pre-test distribution g (. alpha.) m ) And m is (1,2,3 …, m), which has m workers in common.
S2.2 the probability density function of the beta distribution is:
Figure RE-GDA0002676790300000021
wherein alpha and beta are two parameters of beta distribution respectively, and gamma is a gamma function, wherein:
Γ(α)=∫ 0 t α-1 e -t dt;
Γ(α+1)=αΓ(α),α>0;
when α is a positive integer:
Γ(n)=(n-1)!;
the formula for calculating the mean of beta distribution is:
Figure RE-GDA0002676790300000022
the calculation formula of the beta distribution variance is as follows:
Figure RE-GDA0002676790300000023
s2.3, according to the historical reputation information provided by S1, the average value of the accuracy of the k-time task results can be obtained, and the calculation formula is as follows:
Figure RE-GDA0002676790300000024
meanwhile, the variance of the accuracy of the k-time task results can be obtained, and the calculation formula is as follows:
Figure RE-GDA0002676790300000025
s2.4, solving the values of alpha and beta which can be solved by an equation set according to an accuracy mean value calculation formula and an accuracy variance calculation formula of the k tasks, wherein the calculation formula of the alpha is as follows:
Figure RE-GDA0002676790300000026
the formula for calculating β is:
Figure RE-GDA0002676790300000027
and S2.5, substituting the values of alpha and beta obtained in S2.4 into the probability density function of the beta distribution to obtain the pre-test distribution of the accuracy of the data provided by crowdsourcing workers.
Preferably, the data set in S3 is composed of all task results submitted by all workers in the task.
Preferably, the voting consistency method comprises the following steps:
s4.1, preprocessing the data to obtain the correct and wrong task submitting results of each crowdsourcing worker;
s4.2, designing a threshold according to requirements, judging whether the difference between the correct error and the average correct error of the task submitting result of each crowdsourcing worker is within the threshold, if so, indicating that the requirement is met, otherwise, indicating that the requirement is not met;
s4.3, calculating the accuracy of the extracted partial task result of each worker according to the judgment result;
and S4.4, finally, according to the probability density functional expression of the beta distribution and the accuracy of the result of the extracted part of the task of each worker, the conditional probability of the accuracy of the result submitted by the crowdsourcing worker in the task can be obtained.
Preferably, the formula for calculating the posterior accuracy information of the data result submitted by the crowdsourcing worker at this time is as follows:
Figure RE-GDA0002676790300000031
has the advantages that: according to the method, accuracy information of result data provided by crowdsourcing workers at this time is predicted by combining beta distribution, voting consistency rules and a Bayesian algorithm, and good recommendation precision can be provided when the crowdsourcing result data of a large data set is analyzed.
Drawings
Fig. 1 is a schematic representation of the beta distribution prior employed in the method of the present invention.
Detailed Description
The invention is further described in detail in the following with reference to the accompanying drawings.
In this embodiment, the employer issues 400 common Sense-like questions to ask m crowdsourcing workers to answer, and the answers to each Question by each crowdsourcing worker are used as task results to form a Comment Sense Question data set (CSQ). The employer may make a determination as to the accuracy of the data results provided by the crowdsourcing workers by the following steps.
Step 1: according to the basic information of crowdsourcing workers provided by the crowdsourcing platform, extracting the accuracy information of k times of task results in a period of continuous time, which is most stable, of the crowdsourcing workers as historical reputation information.
Step 2: as shown in fig. 1, analyzing the historical reputation information provided by S1 with beta distribution to obtain a pre-test distribution of the work accuracy of crowdsourcing workers, specifically including the following steps:
s2.1-precision of results a provided by the worker indicates the accuracy of the task completed by the worker, a' S pre-test distribution g (. alpha.) m ) And m is (1,2,3 …, m), which has m workers in common.
S2.2 the probability density function of the beta distribution is:
Figure RE-GDA0002676790300000032
wherein alpha and beta are two parameters of beta distribution respectively, and gamma is a gamma function, wherein:
Γ(α)=∫ 0 t α-1 e -t dt;
Γ(α+1)=αΓ(α),α>0;
when α is a positive integer:
Γ(n)=(n-1)!;
the calculation formula of the mean value of the beta distribution is as follows:
Figure RE-GDA0002676790300000041
the calculation formula of the beta distribution variance is as follows:
Figure RE-GDA0002676790300000042
s2.3, obtaining the accuracy average value of k times of task results according to the historical reputation information provided by S1, wherein the calculation formula is as follows:
Figure RE-GDA0002676790300000043
meanwhile, the variance of the accuracy of the k-time task results can be obtained, and the calculation formula is as follows:
Figure RE-GDA0002676790300000044
s2.4, solving the equation set according to the accuracy mean value calculation formula and the accuracy variance calculation formula of the k tasksAnd β, α is calculated by the formula:
Figure RE-GDA0002676790300000045
the formula for calculating β is:
Figure RE-GDA0002676790300000046
and S2.5, substituting the values of alpha and beta obtained in S2.4 into the probability density function of the beta distribution to obtain the pre-test distribution of the accuracy of the data provided by crowdsourcing workers.
And step 3: in the data set CSQ, n (n) provided by m bit workers is randomly extracted<400) A problem result, wherein the problems extracted by all workers are consistent, may be expressed as: r n,m =(r 1,1 ,r 1,2 ,r 1,3 …r 1,m ;r 2,1 ,r 2,2 ,r 2,3 …r 2,m ;…;r n,1 ,r n,2 ,r n,3 ,…r n,m ) Where n denotes the number of questions, m denotes the number of workers, r 1,1 ,r 1,2 ,r 1,3 …r 1,m All answers from all crowdsourcing workers to the 1 st question are shown.
And 4, step 4: analyzing partial crowdsourcing task result data provided by the step S3 by using a voting consistency rule to obtain a conditional probability of crowdsourcing data accuracy, specifically comprising the following steps:
s4.1, preprocessing the data, and obtaining the average value of the results of the nth question answered by m workers according to the task results provided in the step 3, wherein the calculation formula is as follows:
Figure RE-GDA0002676790300000051
where r is n,m Representative is the result of the mth worker answering the nth question.
After the average value is calculated according to an average value calculation formula, judging whether the task results submitted by m crowdsourcing workers are correct or incorrect according to the average value, wherein the calculation formula is as follows:
Figure RE-GDA0002676790300000052
s4.2, according to the requirement, omega is a threshold value for judging task precision, if r is n,m And
Figure RE-GDA0002676790300000053
if the difference is less than the set threshold, the question answered by the worker meets the requirement, otherwise, the question does not meet the requirement, and the question is regarded as an answer error;
and S4.3, according to the judgment result, calculating the accuracy of the result of the part of the tasks extracted by each worker, wherein the calculation formula is as follows:
Figure RE-GDA0002676790300000054
s4.4, finally, according to the probability density function of the beta distribution and the accuracy of the result of the part of the tasks extracted by each worker, the conditional probability of the accuracy of the result submitted by the crowdsourcing workers in the task can be obtained, and the calculation formula is as follows:
Figure RE-GDA0002676790300000055
wherein tau is a preset parameter, and 0 < tau < 1. g (a) m ) And the credibility of the m workers participating in the task is represented, namely the comprehensive credibility information obtained according to the historical task credibility information.
And 5: according to the prior information obtained in the step 2 and the conditional probability obtained in the step 4, the two are combined through a Bayesian model for calculation, so that the accuracy of the crowdsourcing task result data provided by crowdsourcing workers in the crowdsourcing task at this time can be obtained, and the calculation formula is as follows:
Figure RE-GDA0002676790300000061
the accuracy of data provided by the workers can be obtained through the formula, namely the accuracy of the task result provided by crowdsourcing workers in the task can be represented.
The invention mainly aims at the problems faced by the existing crowdsourcing data screening method, and provides a robust crowdsourcing data analysis method based on a trust model.
In the above embodiments, the present invention has been described only by way of example, but various modifications can be made by those skilled in the art after reading the present application without departing from the spirit and scope of the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims (5)

1. A robust crowdsourcing data analysis method based on a trust model is characterized by comprising the following steps: the method comprises the following steps:
s1: extracting historical credit degree information of crowdsourcing workers according to the basic information of the crowdsourcing workers provided by the crowdsourcing platform;
s2: analyzing the historical reputation information provided by S1 by using beta distribution to obtain the pre-test distribution of the work accuracy of crowdsourcing workers, and specifically comprising the following steps:
s2.1-precision of results a provided by the worker indicates the accuracy of the task completed by the worker, a' S pre-test distribution g (. alpha.) m ) M is (1,2,3 …, m), which shares m workers;
s2.2 the probability density function of the beta distribution is:
Figure FDA0003730062020000011
wherein alpha and beta are two parameters of beta distribution respectively, and gamma is a gamma function, wherein:
Γ(α)=∫ 0 t α-1 e -t dt;
Γ(α+1)=αΓ(α),α>0;
when α is a positive integer:
Γ(n)=(n-1)!;
the calculation formula of the mean value of the beta distribution is as follows:
Figure FDA0003730062020000012
the calculation formula of the beta distribution variance is as follows:
Figure FDA0003730062020000013
s2.3, obtaining the accuracy average value of the k-time task results according to the historical reputation information provided by S1, wherein the calculation formula is as follows:
Figure FDA0003730062020000014
and simultaneously solving the variance of the accuracy of the k-time task results, wherein the calculation formula is as follows:
Figure FDA0003730062020000015
s2.4, solving the values of alpha and beta obtained by an equation set according to an accuracy mean calculation formula and an accuracy variance calculation formula of the k tasks, wherein the calculation formula of the alpha is as follows:
Figure FDA0003730062020000016
the formula for calculating β is:
Figure FDA0003730062020000021
s2.5, substituting the probability density function of the beta distribution according to the values of alpha and beta obtained in S2.4 to obtain the pre-test distribution of the accuracy of the data provided by crowdsourcing workers;
s3: randomly selecting task results of part crowdsourcing workers after the task issued by the employer is completed according to the data set;
s4: analyzing part of crowdsourcing task result data provided by the S3 by using a voting consistency rule to obtain a conditional probability of crowdsourcing data accuracy;
s5: and (4) performing combined calculation on the prior information obtained in the step (S2) and the conditional probability obtained in the step (S4) through a Bayesian model to obtain the posterior accuracy information of the data result submitted by crowdsourcing workers at the time.
2. The robust crowd-sourced data analysis method based on a trust model of claim 1, wherein: the historical reputation information of the crowdsourcing workers extracted in the S1 takes the accuracy information of the k task results in a continuous period of time in which the crowdsourcing workers are most stable as the historical reputation information.
3. The robust crowdsourced data analytics method based on trust model of claim 1, wherein: the data set in S3 is composed of all task results submitted by all workers in the task.
4. The robust crowd-sourced data analysis method based on a trust model of claim 1, wherein: the voting consistency method comprises the following steps:
s4.1, preprocessing the data to obtain the correct and wrong task submitting results of each crowdsourcing worker;
s4.2, designing a threshold according to requirements, judging whether the difference between the correct error and the average correct error of the task submitting result of each crowdsourcing worker is within the threshold, if so, indicating that the requirement is met, otherwise, indicating that the requirement is not met;
s4.3, calculating the accuracy of the extracted partial task result of each worker according to the judgment result;
and S4.4, finally obtaining data meeting the requirements, and finally obtaining the conditional probability of the accuracy of the result submitted by the crowdsourcing workers in the task according to the probability density function of the beta distribution and the accuracy of the result of the part of the task extracted by each worker.
5. The robust crowd-sourced data analysis method based on a trust model of claim 1, wherein: the calculation formula for obtaining the posterior accuracy information of the data result submitted by the crowdsourcing workers at this time is as follows:
Figure FDA0003730062020000022
CN202010551752.2A 2020-06-17 2020-06-17 Robust crowdsourcing data analysis method based on trust model Active CN112396279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010551752.2A CN112396279B (en) 2020-06-17 2020-06-17 Robust crowdsourcing data analysis method based on trust model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010551752.2A CN112396279B (en) 2020-06-17 2020-06-17 Robust crowdsourcing data analysis method based on trust model

Publications (2)

Publication Number Publication Date
CN112396279A CN112396279A (en) 2021-02-23
CN112396279B true CN112396279B (en) 2022-08-26

Family

ID=74603840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010551752.2A Active CN112396279B (en) 2020-06-17 2020-06-17 Robust crowdsourcing data analysis method based on trust model

Country Status (1)

Country Link
CN (1) CN112396279B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133330A (en) * 2018-01-12 2018-06-08 东北大学 One kind is towards social crowdsourcing method for allocating tasks and its system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133330A (en) * 2018-01-12 2018-06-08 东北大学 One kind is towards social crowdsourcing method for allocating tasks and its system

Also Published As

Publication number Publication date
CN112396279A (en) 2021-02-23

Similar Documents

Publication Publication Date Title
Frydman et al. The role of salience and attention in choice under risk: An experimental investigation
Benda et al. Levelling the playing field? Active labour market policies, educational attainment and unemployment
Mosthaf Do scarring effects of low‐wage employment and non‐employment differ between levels of qualification?
Obsie et al. Prediction of student academic performance using neural network, linear regression and support vector regression: a case study
Evans et al. A comparison of conflict diffusion models in the flanker task through pseudolikelihood Bayes factors.
Vrieze et al. Predicting sex offender recidivism. I. Correcting for item overselection and accuracy overestimation in scale development. II. Sampling error-induced attenuation of predictive validity over base rate information.
CN112529750A (en) Learning event recommendation method and system based on graph neural network model
JP4795496B1 (en) Questionnaire counting system
CN112396279B (en) Robust crowdsourcing data analysis method based on trust model
Walker et al. Beyond percent correct: Measuring change in individual picture naming ability
Jacovidis Evaluating the performance of propensity score matching methods: A simulation study
Grimmer et al. The unreliability of measures of intercoder reliability, and what to do about it
Lucchetti et al. Lassoing welfare dynamics with cross-sectional data
Si et al. Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples
Dette et al. Testing for equivalence of pre-trends in Difference-in-Differences estimation
Carrillo-Tudela et al. Cyclical earnings, career and employment transitions
CN111403013A (en) Method and device for capability assessment
Kulkarni Development of performance prediction models using expert opinions
Siravegna The gender gap across the wage distribution in Chile
Nova et al. In-Depth, Breadth-First, or both? Characterising the information search process in a PublicTransport SP Experiment
Chen et al. Generalization of Heckman selection model to nonignorable nonresponse using call-back information
West et al. Complete imputation of missing repeated categorical data: one‐sample applications
Pulka Preliminary Analysis of The Determinants of SMEs Performance
McCarthy et al. Towards operationalizing outlier detection in community health programs
Qu et al. What is my next job: Predicting the company size and position in career changes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant