CN112396279A

CN112396279A - Robust crowdsourcing data analysis method based on trust model

Info

Publication number: CN112396279A
Application number: CN202010551752.2A
Authority: CN
Inventors: 孙杰; 焦玉全; 吴礼发
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2021-02-23
Anticipated expiration: 2040-06-17
Also published as: CN112396279B

Abstract

The invention discloses a robust crowdsourcing data analysis method based on a trust model, which analyzes historical credibility information of crowdsourcing workers by using beta distribution, then analyzes data of task results of part crowdsourcing workers in current tasks by using a voting consistency rule, and finally predicts accuracy information of result data provided by the crowdsourcing workers at this time by using a Bayes algorithm.

Description

Robust crowdsourcing data analysis method based on trust model

Technical Field

The invention relates to the field of recommendation algorithms, in particular to a robust crowdsourcing data analysis method based on a trust model.

Background

Some existing recommendation algorithms can analyze crowdsourcing result data provided by crowdsourcing workers (parties who complete crowdsourcing tasks) according to the requirements of employers, and screen out high-quality result data to recommend the high-quality result data to users. Therefore, once the working state of the crowdsourcing worker changes in the task, the precision of the evaluation result of the historical behavior data of the worker and the precision of the result data submitted by the worker in the task are greatly deviated, and the actual accuracy of the result data submitted by the crowdsourcing worker in the task is lower than that estimated in advance. Therefore, how to enable the employer to obtain crowdsourcing result data with high accuracy under any condition becomes a difficulty in current crowdsourcing data quality research.

The problem faced by high-quality crowdsourcing data screening at present is a difficult problem which is urgently needed to be solved in the crowdsourcing field. The present invention can solve the above problems well.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides a robust crowdsourcing data analysis method based on a trust model, which can more efficiently and accurately screen crowdsourcing data.

The technical scheme is as follows: the invention relates to a robust crowdsourcing data analysis method based on a trust model, which comprises the following steps:

s1: extracting historical credit degree information of crowdsourcing workers according to the basic information of the crowdsourcing workers provided by the crowdsourcing platform;

s2: analyzing the historical credit degree information provided by the S1 by using beta distribution to obtain the pre-test distribution of the work precision of crowdsourcing workers;

s3: randomly selecting task results of part crowdsourcing workers after the task issued by the employer is completed according to the data set;

s4: analyzing part of crowdsourcing task result data provided by the S3 by using a voting consistency rule to obtain a conditional probability of crowdsourcing data accuracy;

s5: and (4) performing combined calculation on the prior information obtained in the step (S2) and the conditional probability obtained in the step (S4) through a Bayesian model to obtain the posterior accuracy information of the data result submitted by crowdsourcing at this time.

Preferably, the historical reputation information of the crowdsourcing workers extracted in S1 takes accuracy information of k task results over a continuous period of time in which the crowdsourcing workers are most stable as the historical reputation information.

Preferably, the analyzing historical reputation information by beta distribution in S2 specifically includes the following steps:

s2.1-precision of results a provided by the worker indicates the accuracy of the task completed by the worker, a' S pre-testDistribution g (. alpha.)_m) And m is (1,2,3 …, m), which has m workers in common.

S2.2 the probability density function of the beta distribution is:

wherein alpha and beta are two parameters of beta distribution respectively, and gamma is a gamma function, wherein:

Γ(α)＝∫₀t^α-1e^-tdt；

Γ(α+1)＝αΓ(α),α＞0；

when α is a positive integer:

Γ(n)＝(n-1)！；

the calculation formula of the mean value of the beta distribution is as follows:

the calculation formula of the beta distribution variance is as follows:

s2.3, obtaining the accuracy average value of k times of task results according to the historical reputation information provided by S1, wherein the calculation formula is as follows:

meanwhile, the variance of the accuracy of the k-time task results can be obtained, and the calculation formula is as follows:

s2.4, solving the values of alpha and beta which can be solved by an equation set according to an accuracy mean value calculation formula and an accuracy variance calculation formula of the k tasks, wherein the calculation formula of the alpha is as follows:

the formula for calculating β is:

and S2.5, substituting the values of alpha and beta obtained in S2.4 into the probability density function of the beta distribution to obtain the pre-test distribution of the accuracy of the data provided by crowdsourcing workers.

Preferably, the data set in S3 is composed of all task results submitted by all workers in the task.

Preferably, the voting consistency method comprises the following steps:

s4.1, preprocessing the data to obtain the correct and wrong task submitting results of each crowdsourcing worker;

s4.2, designing a threshold according to requirements, judging whether the difference between the correct error and the average correct error of the task submitting result of each crowdsourcing worker is within the threshold, if so, indicating that the requirement is met, otherwise, indicating that the requirement is not met;

s4.3, calculating the accuracy of the extracted partial task result of each worker according to the judgment result;

and S4.4, finally, according to the probability density function of the beta distribution and the accuracy of the result of the part of the tasks extracted by each worker, the conditional probability of the accuracy of the result submitted by the crowdsourcing worker in the task can be obtained.

Preferably, the formula for calculating the posterior accuracy information of the data result submitted by the crowdsourcing worker at this time is as follows:

has the advantages that: according to the method, accuracy information of result data provided by crowdsourcing workers at this time is predicted by combining beta distribution, voting consistency rules and a Bayesian algorithm, and good recommendation precision can be provided when the crowdsourcing result data of a large data set is analyzed.

Drawings

Fig. 1 is a schematic representation of the beta distribution prior employed in the method of the present invention.

Detailed Description

The invention is further described in detail in the following with reference to the accompanying drawings.

In this embodiment, an employer issues 400 common Sense-like questions, asks m crowdsourcing workers to answer, and the answers of each crowdsourcing worker to each Question are used as task results to form a Comment Sense Question data set (CSQ). The employer may make a determination as to the accuracy of the data results provided by the crowdsourcing workers by the following steps.

Step 1: according to basic information of crowdsourcing workers provided by a crowdsourcing platform, extracting accuracy information of k task results of the crowdsourcing workers in a most stable period of continuous time to serve as historical reputation information.

Step 2: as shown in fig. 1, analyzing the historical reputation information provided by S1 with beta distribution to obtain a pre-test distribution of the work accuracy of crowdsourcing workers, specifically including the following steps:

s2.1-precision of results a provided by the worker indicates the accuracy of the task completed by the worker, a' S pre-test distribution g (. alpha.)_m) And m is (1,2,3 …, m), which has m workers in common.

S2.2 the probability density function of the beta distribution is:

Γ(α)＝∫₀t^α-1e^-tdt；

Γ(α+1)＝αΓ(α),α＞0；

when α is a positive integer:

Γ(n)＝(n-1)！；

the calculation formula of the beta distribution variance is as follows:

the formula for calculating β is:

And step 3: in the data set CSQ, n (n) provided by m bit workers is randomly extracted<400) A problem result, wherein the problems extracted by all workers are consistent, may be expressed as: r_n,m＝(r_1,1,r_1,2,r_1,3…r_1,m；r_2,1,r_2,2,r_2,3…r_2,m；…；r_n,1,r_n,2,r_n,3,…r_n,m) Where n denotes the number of questions, m denotes the number of workers, r_1,1,r_1,2,r_1,3…r_1,mAll answers from all crowdsourcing workers to the 1 st question are shown.

And 4, step 4: analyzing partial crowdsourcing task result data provided by the step S3 by using a voting consistency rule to obtain a conditional probability of crowdsourcing data accuracy, specifically comprising the following steps:

s4.1, preprocessing the data, and obtaining the average value of the results of the nth question answered by m workers according to the task results provided in the step 3, wherein the calculation formula is as follows:

where r is_n，mRepresentative is the result of the mth worker answering the nth question.

After the average value is calculated according to an average value calculation formula, judging whether the task results submitted by m crowdsourcing workers are correct or incorrect according to the average value, wherein the calculation formula is as follows:

s4.2, according to the requirement, omega is a threshold value for judging task precision, if r is_n，mAnd

if the difference is smaller than the set threshold, the question answered by the worker meets the requirement, otherwise, the question does not meet the requirement and is regarded as an answer error;

and S4.3, according to the judgment result, calculating the accuracy of the result of the part of the tasks extracted by each worker, wherein the calculation formula is as follows:

s4.4, finally, according to the probability density function of the beta distribution and the accuracy of the result of the part of the tasks extracted by each worker, the conditional probability of the accuracy of the result submitted by the crowdsourcing workers in the task can be obtained, and the calculation formula is as follows:

wherein tau is a preset parameter, and 0 < tau < 1. g (a)_m) And representing the credibility of the m workers participating in the task, namely the comprehensive credibility information obtained according to the historical task credibility information.

And 5: according to the prior information obtained in the step 2 and the conditional probability obtained in the step 4, the two are combined through a Bayesian model for calculation, so that the accuracy of the crowdsourcing task result data provided by crowdsourcing workers in the crowdsourcing task at this time can be obtained, and the calculation formula is as follows:

the accuracy of the data provided by the workers can be obtained through the formula, namely the accuracy of the task result provided by the crowdsourcing workers in the task can be represented.

The invention mainly aims at the problems faced by the existing crowdsourcing data screening method, and provides a robust crowdsourcing data analysis method based on a trust model.

In the above embodiments, the present invention has been described only by way of example, but various modifications, equivalent substitutions, improvements and the like within the spirit and principle of the present invention may be made by those skilled in the art after reading the present application without departing from the spirit and scope of the present invention.

Claims

1. a robust crowdsourcing data analysis method based on trust model, is characterized in that: comprise the following steps:

S1: Extract the historical reputation information of the crowdsourcing workers according to the basic information of the crowdsourcing workers provided by the crowdsourcing platform;

S2: Use beta distribution to analyze the historical credibility information provided by S1, and obtain the prior distribution of the work accuracy of crowdsourcing workers;

S3: Randomly select the task results of some crowdsourcing workers after completing the tasks released by the employer according to the data set;

S4: Analyze part of the crowdsourcing task result data provided by S3 with voting consistency rules, and obtain the conditional probability of crowdsourcing data accuracy;

S5: Combine the prior information obtained in S2 with the conditional probability obtained in S4 through the Bayesian model, and obtain the posterior accuracy information of the data submitted by the crowdsourcing workers this time.

2. the robust crowdsourcing data analysis method based on trust model according to claim 1, is characterized in that: the historical credibility information of crowdsourcing workers extracted in described S1 is continuous with the most stable segment of crowdsourcing workers The accuracy information of k task results in time is used as historical reputation information.

3. the robust crowdsourcing data analysis method based on trust model according to claim 1, is characterized in that: in described S2, use beta distribution to analyze historical credibility information specifically comprises the following steps:

S2.1: The accuracy of the result provided by the worker a represents the accuracy of the worker completing the task, the pre-test distribution of a is g(α _m ), m=(1,2,3...,m), there are m workers in total.

S2.2: The probability density function of the beta distribution is:

where α and β are the two parameters of the beta distribution, respectively, and Γ is the gamma function, where:

Γ(α)=∫ ₀ t ^α-1 e ^-t dt;

Γ(α+1)=αΓ(α), α>0;

When α is a positive integer:

Γ(n)=(n-1)! ;

The formula for calculating the mean of the beta distribution is:

The formula for calculating the variance of the beta distribution is:

S2.3: According to the historical reputation information provided by S1, the average accuracy of the results of k tasks can be obtained. The calculation formula is:

At the same time, the variance of the accuracy of the k task results can be calculated, and the calculation formula is:

S2.4: According to the calculation formula of the accuracy mean value and the accuracy variance calculation formula of the k-th task, the values of α and β that can be obtained by solving the equation system, the calculation formula of α is:

The formula for calculating β is:

S2.5: According to the values of α and β obtained in S2.4, the probability density function of the beta distribution is brought into, and the prior distribution of the accuracy of the data provided by the crowdsourcing workers is obtained.

4 . The robust crowdsourcing data analysis method based on the trust model according to claim 1 , wherein the data set in S3 is composed of all task results submitted by all workers in this task. 5 .

5. The robust crowdsourcing data analysis method based on trust model according to claim 1, is characterized in that: described voting consistency method comprises the following steps:

S4.1: Preprocess the data to obtain the correctness and error of the task results submitted by each crowdsourcing worker;

S4.2: According to the requirements, design a threshold value, and judge whether the difference between the correctness and error of each crowdsourcing worker's submitted task result and the average correctness and error is within the threshold value. If it is, it means that the requirements are met, otherwise, it means that the requirements are not met;

S4.3: According to the judgment result, calculate the accuracy of the extracted part of the task result of each worker;

S4.4: Finally, according to the probability density function of beta distribution and the accuracy of each worker's extracted part of the task results, the conditional probability of the accuracy of the results submitted by the crowdsourcing workers in this task can be obtained.

6. the robust crowdsourcing data analysis method based on trust model according to claim 1, is characterized in that: the calculation formula of the post-test accuracy information that described crowdsourcing worker submits data result this time is: