CN107767055B - Crowdsourcing result aggregation method and device based on collusion detection - Google Patents

Crowdsourcing result aggregation method and device based on collusion detection Download PDF

Info

Publication number
CN107767055B
CN107767055B CN201711003779.2A CN201711003779A CN107767055B CN 107767055 B CN107767055 B CN 107767055B CN 201711003779 A CN201711003779 A CN 201711003779A CN 107767055 B CN107767055 B CN 107767055B
Authority
CN
China
Prior art keywords
worker
answer set
repeated
answers
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711003779.2A
Other languages
Chinese (zh)
Other versions
CN107767055A (en
Inventor
孙海龙
王旭
陈鹏鹏
方毅立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201711003779.2A priority Critical patent/CN107767055B/en
Publication of CN107767055A publication Critical patent/CN107767055A/en
Application granted granted Critical
Publication of CN107767055B publication Critical patent/CN107767055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a crowdsourcing result gathering method and device based on collusion detection, wherein the method comprises the following steps: collecting answer sets of workers aiming at the task sets from the crowdsourcing platform; calculating a convergence result of the answer set, and calculating a consistency parameter of the convergence result and the answers of each worker; determining repeated answer sets from the answer sets, and calculating the worker capacity change rate corresponding to each repeated answer set based on the consistency parameters of the answers of all workers; for a repeated answer set with a worker capacity change rate less than or equal to a preset threshold value, determining that the repeated answer set is normally generated and reserving the repeated answer set in the answer set; for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is collusion generation and deleting the repeated answer set in the answer set; and obtaining an updated answer set, and calculating a convergence result of the updated answer set.

Description

Crowdsourcing result aggregation method and device based on collusion detection
Technical Field
The invention relates to the technical field of crowdsourcing, in particular to a crowdsourcing result gathering method and device based on collusion detection.
Background
Crowdsourcing is a rapidly developing field aimed at solving the problem that computers are difficult to solve using human cognitive advantages. Popular common platforms such as crowdfower and AMT are widely used by people for general data processing tasks such as emotion analysis, handwriting recognition and picture tagging. One core problem of crowdsourcing is ensuring the quality of results, since workers may return results of poor quality. A widely adopted method of controlling quality is result aggregation, which first assigns each task to multiple workers and then uses an inference algorithm to aggregate the results returned by the workers. Taking the image annotation as an example, one image is distributed to a plurality of workers, and then the workers respectively provide tags describing the contents of the images. Finally, a high quality result is gathered from all the collected tags by voting or other reasoning methods.
In crowdsourcing, in order to obtain more remuneration, less labor is paid, and colluders form collusion teams through short messages, WeChat, telephone, forum and even face-to-face communication outside a platform. In a collusion team, only one worker processes the task and the other worker plagiates his answer. All workers in the final team provide the same answer. These malicious repeated answers will dominate the answers provided by normal workers in the result aggregation, reducing the quality of the results. For example, if one task is given to five workers for execution, and three workers collude, the results are converged by using most voting methods, and the final converged result is equal to the result provided by the colluder.
From the above, it can be seen that the repeated answers generated by collusion are detrimental to the quality of results for general tasks on a generic platform. However, existing collusion detection algorithms do not effectively detect and eliminate the negative effects of such collusion.
Disclosure of Invention
In order to solve the above technical problem, embodiments of the present invention provide a crowdsourcing result aggregation method and apparatus based on collusion detection.
The crowdsourcing result converging method based on collusion detection provided by the embodiment of the invention comprises the following steps:
collecting answer sets of workers aiming at the task sets from the crowdsourcing platform;
calculating a convergence result of the answer set, and calculating a consistency parameter of the convergence result and the answers of each worker;
determining repeated answer sets from the answer sets, and calculating the worker capacity change rate corresponding to each repeated answer set based on the consistency parameters of the answers of all workers;
for a repeated answer set with a worker capacity change rate less than or equal to a preset threshold value, determining that the repeated answer set is normally generated and reserving the repeated answer set in the answer set;
for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is collusion generation and deleting the repeated answer set in the answer set;
and after the repeated answer sets are reserved or deleted, an updated answer set is obtained, and a convergence result of the updated answer set is calculated.
In an embodiment of the present invention, the calculating consistency parameters of the convergence result and the answers of the workers includes:
calculating a consistency parameter of the aggregated results and the answers of each worker based on the following formula:
Figure BDA0001444018210000021
wherein, PiFor the convergence of results and worker i and consistency parameters, LiThe answers returned for worker i corresponding to the task set,
Figure BDA0001444018210000022
is the aggregate result of the answer set.
In this embodiment of the present invention, the calculating a worker capability change rate corresponding to each repeated answer set based on the consistency parameter of the answers of each worker includes:
calculating a first variance of the consistency parameter of the answers of each worker when a set of repeated answers remains in the set of answers;
calculating a second variance of the consistency parameter of the answers of each worker when a duplicate answer set is deleted in the answer set;
and calculating the worker capacity change rate corresponding to the repeated answer set based on the first variance and the second variance.
In an embodiment of the present invention, the calculating a first variance of the consistency parameter of the answers of each worker when the repeated answer set is retained in the answer set includes:
keeping a repeated answer set in the answer set by calculating the following formula
Figure BDA0001444018210000031
And then, the first variance of the consistency parameters of the answers of the workers:
Figure BDA0001444018210000032
wherein Var (P) is the first variance, E (P) is the average of the consistency parameters of each worker, PiTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000033
is a set of answers.
In this embodiment of the present invention, the calculating a second variance of the consistency parameter of the answers of each worker when the repeated answer set is deleted from the answer set includes:
calculating to delete a duplicate answer set in the answer set by the following formula
Figure BDA0001444018210000034
And a second variance of the consistency parameter of the answers of said workers:
Figure BDA0001444018210000035
wherein, Var (P)k) Is the second variance, Ε (P)k) Is the average of the consistency parameters, P, of each workeri kTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000036
for deleting
Figure BDA0001444018210000037
The latter answer set.
In an embodiment of the present invention, the calculating a worker capability change rate corresponding to the repeated answer set based on the first variance and the second variance includes:
calculating a set of repeated answers based on the following formula
Figure BDA0001444018210000038
Corresponding worker capacity change rate:
Figure BDA0001444018210000039
wherein the content of the first and second substances,
Figure BDA00014440182100000310
is the rate of change of worker capacity.
The embodiment of the invention provides a crowdsourcing result gathering device based on collusion detection, which comprises:
the collection module is used for collecting answer sets of all workers aiming at the task sets from the crowdsourcing platform;
the consistency calculation module is used for calculating the convergence result of the answer set and calculating the consistency parameters of the convergence result and the answers of all workers;
the worker capacity change rate module is used for determining repeated answer sets from the answer sets and calculating the worker capacity change rate corresponding to each repeated answer set based on the consistency parameters of the answers of all workers;
the collusion detection module is used for determining that the repeated answer set is normally generated and reserving the repeated answer set in the answer set aiming at the repeated answer set with the worker capability change rate less than or equal to a preset threshold value; for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is collusion generation and deleting the repeated answer set in the answer set;
and the aggregation module is used for obtaining an updated answer set after the repeated answer sets are reserved or deleted, and calculating an aggregation result of the updated answer set.
In an embodiment of the present invention, the consistency calculation module is specifically configured to calculate consistency parameters of the convergence result and answers of each worker based on the following formulas:
Figure BDA0001444018210000041
wherein, PiFor the convergence of results and worker i and consistency parameters, LiThe answers returned for worker i corresponding to the task set,
Figure BDA0001444018210000042
is the aggregate result of the answer set.
In an embodiment of the present invention, the worker capability change rate module includes:
a first variance calculating unit for calculating a first variance of a consistency parameter of the answers of each worker when a repeated answer set remains in the answer set;
a second variance calculating unit for calculating a second variance of the consistency parameter of the answers of each worker when the repeated answer set is deleted in the answer set;
and the worker capacity change rate calculation unit is used for calculating the worker capacity change rate corresponding to the repeated answer set based on the first variance and the second variance.
In an embodiment of the present invention, the first variance calculating unit is specifically configured to calculate a repeated answer set reserved in the answer set according to the following formula
Figure BDA0001444018210000043
And then, the first variance of the consistency parameters of the answers of the workers:
Figure BDA0001444018210000044
wherein Var (P) is the first variance, E (P) is the average of the consistency parameters of each worker, PiTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000051
is a set of answers.
In the embodiment of the present invention, the first and second substrates,the second variance calculating unit is specifically configured to calculate a repeated answer set deleted from the answer set by the following formula
Figure BDA0001444018210000052
And a second variance of the consistency parameter of the answers of said workers:
Figure BDA0001444018210000053
wherein, Var (P)k) Is the second variance, Ε (P)k) Is the average of the consistency parameters, P, of each workeri kTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000054
for deleting
Figure BDA0001444018210000055
The latter answer set.
In an embodiment of the present invention, the worker capability change rate calculating unit is specifically configured to calculate a repeated answer set based on the following formula
Figure BDA0001444018210000056
Corresponding worker capacity change rate:
Figure BDA0001444018210000057
wherein the content of the first and second substances,
Figure BDA0001444018210000058
is the rate of change of worker capacity.
By adopting the technical scheme of the embodiment of the invention, (1) different from the scenes of space-time crowdsourcing and social network, the characteristics of answers of general tasks in a general platform are unknown. Therefore, the embodiment of the invention introduces the concept of consistency of worker answers and convergence results to describe the influence of repeated answers generated by collusion on result convergence.
(2) Different from a collusion detection algorithm based on similarity in an electronic commerce platform, the embodiment of the invention provides a collusion detection method based on worker performance change rate, which can judge repeated answers generated by collusion in an answer set containing normal repeated answers. (3) The embodiment of the invention provides a crowdsourcing result convergence method for collusion detection, which can effectively eliminate the negative influence of collusion behavior on result convergence.
Drawings
FIG. 1 is a schematic diagram of a crowd-sourcing framework based on collusion detection according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a crowdsourcing result aggregation method based on collusion detection according to an embodiment of the present invention;
fig. 3 is a schematic structural composition diagram of a crowdsourcing result aggregation device based on collusion detection according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the results of the worker capability change rate module in accordance with an embodiment of the present invention.
Detailed Description
So that the manner in which the features and aspects of the embodiments of the present invention can be understood in detail, a more particular description of the embodiments of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings.
The existing collusion detection algorithm cannot effectively detect and eliminate the negative effects of collusion, and mainly has the following reasons:
(1) the detection algorithm for collusion in space-time crowdsourcing and social networks needs to extract some characteristics of data to detect collusion, for example, in space-time crowdsourcing, collusion is detected by using spatial and temporal characteristics of collected data. However, these features are difficult to obtain on a common crowdsourcing platform.
(2) The detection algorithm in the e-commerce platform mainly detects collusion based on the similarity between answers provided by each pair of workers. Since the repeated answers of the tasks in the general platform are divided into normal repetition and collusion repetition. In some simple tasks the worker exhibits a high ability, when many of the repeated answers are generated normally. Checking for collusion based on the similarity of answers would misinterpret a normally repeated answer as an answer generated by the collusion.
(3) In the auction platform, the participants often collude to obtain high payback at low cost. The algorithm is mainly used for detecting collusion behaviors based on the game theory and is difficult to be suitable for general tasks on a general platform.
In summary, for the general task of the general platform, the existing algorithm cannot effectively detect and eliminate the harm of the collusion generating repeated answers to the result quality. Aiming at the existing problems, the technical scheme of the embodiment of the invention provides a crowdsourcing quality control method based on collusion detection.
Fig. 1 is a schematic diagram of a crowd-sourcing framework based on collusion detection according to an embodiment of the present invention, as shown in fig. 1, the framework includes the following steps:
(1) the requester issues the task to a crowdsourcing platform, such as a mechanical turn, where the requester gives a corresponding reward based on the quality of the worker's answer.
(2) Tasks are assigned to workers according to scheduling policies and user-specified platform constraints.
(3) Indeed, some workers are not independent and may even collaborate outside the platform to handle some crowdsourced tasks. Workers may catch on each other behind the curtain. For example, workers pirate others who work on the same crowd-sourced via an online forum. After task processing, the answers are collected and some noisy answers are eliminated, e.g., some answers are apparently not related to the picture in the image tagging task.
(4) This step involves collusion detection and result aggregation. After all workers who finish collecting workers return answers, the embodiment of the invention adopts a collusion detection mechanism to detect collusion behaviors and then filters out repeated answers generated by colluders. After result filtering, embodiments of the present invention use a convergence approach to infer the final result of each task and submit it to the requester.
The core of the framework of the embodiment of the invention is step (4), which comprises the collusion detection method provided by the embodiment of the invention and then adopts a result reasoning method, so that a high-quality result can be reasoned even under the condition of collusion.
The collusion detection crowdsourcing framework provided by the embodiment of the invention effectively solves the problem that the existing result convergence algorithm is difficult to effectively eliminate the harm of collusion to result convergence. Unlike a general crowdsourcing framework, workers in the crowdsourcing framework proposed by the embodiment of the invention are not independent any more, but may communicate or even collude with each other. In addition, a result reasoning part in the framework comprises a serial port detection process.
The technical scheme of the embodiment of the invention integrally comprises the following steps: the method comprises three steps of collusion detection, result filtering and result aggregation, and the three steps are described below.
The method comprises the following steps: collusion detection
(1) Calculating the consistency of the convergence result and the worker answers: when a worker completes a task process, the answers returned by the worker are first collected, assuming a set of tasks is completed
Figure BDA0001444018210000071
The answer set returned by the worker is
Figure BDA0001444018210000072
Is provided with
Figure BDA0001444018210000073
The answer set is repeated for one of the answer sets. The purpose of the embodiment of the invention is to judge whether a repeated set is generated by collusion or not and to judge an answer set on the basis of the repeated set
Figure BDA0001444018210000074
Pooling is performed to obtain high quality results.
Gathering answer sets by utilizing most voting methods to obtain gathering results
Figure BDA0001444018210000075
The embodiment of the invention provides a calculation formula for consistency of the convergence result and the answer of the worker i:
Figure BDA0001444018210000076
wherein L isiCorresponding to task collections for worker i
Figure BDA0001444018210000077
The returned answer set.
(2) Calculating a worker ability change rate for each repeated answer set: for a set of repeated answers, the worker capability change rate is mainly used for measuring the overall performance of the set of repeated answers on the consistency of the worker answers and the aggregated result. The embodiment of the invention utilizes the variance change of the overall consistency before and after the repeated answer set is deleted to form the human ability change rate. First, a set of retained duplicate answers is computed
Figure BDA0001444018210000081
Time, variance of worker answer consistency:
Figure BDA0001444018210000082
deleting duplicate answer sets
Figure BDA0001444018210000083
Can obtain
Figure BDA0001444018210000084
Similarly, a set of pruned answers is computed
Figure BDA0001444018210000085
Variance of worker answer consistency:
Figure BDA0001444018210000086
then, the worker capacity change rate is obtained by the formulas of the two modes:
Figure BDA0001444018210000087
(3) determining whether the repeated answer was generated by collusion: when in use
Figure BDA0001444018210000088
When the Threshold is less than or equal to the Threshold, the repeated set is considered
Figure BDA0001444018210000089
The answer is a normal repeat answer. When in use
Figure BDA00014440182100000810
If the Threshold is exceeded, then the duplicate set is considered
Figure BDA00014440182100000811
Repeat answers for collusion.
In the above scheme, the repeated answer set is calculated
Figure BDA00014440182100000812
The variance of the consistency of the worker answers can also be obtained by using the results of other convergence algorithms
Figure BDA00014440182100000813
Such as a method of convergence of probabilities.
Step two: result filtering
Repeating the above steps to a pair of answer sets
Figure BDA00014440182100000814
All duplicate sets in (1) are detected. The duplicate answers determined to be colluded will be deleted and the answers determined to be normally duplicated will be retained.
Step three: result aggregation
And converging the answer set by using the existing result convergence algorithm to obtain a final result.
Fig. 2 is a schematic flow chart of a crowdsourcing result aggregation method based on collusion detection according to an embodiment of the present invention, and as shown in fig. 2, the crowdsourcing result aggregation method based on collusion detection includes the following steps:
step 201: a set of answers for each worker to the set of tasks is collected from the crowdsourcing platform.
Step 202: and calculating a convergence result of the answer set, and calculating a consistency parameter of the convergence result and the answers of all workers.
In the embodiment of the invention, the consistency parameters of the convergence result and the answers of each worker are calculated based on the following formula:
Figure BDA0001444018210000091
wherein, PiFor the convergence of results and worker i and consistency parameters, LiThe answers returned for worker i corresponding to the task set,
Figure BDA0001444018210000092
is the aggregate result of the answer set.
Step 203: and determining repeated answer sets from the answer sets, and calculating the worker capacity change rate corresponding to each repeated answer set based on the consistency parameters of the answers of all workers.
In this embodiment of the present invention, the calculating a worker capability change rate corresponding to each repeated answer set based on the consistency parameter of the answers of each worker includes:
calculating a first variance of the consistency parameter of the answers of each worker when a set of repeated answers remains in the set of answers;
calculating a second variance of the consistency parameter of the answers of each worker when a duplicate answer set is deleted in the answer set;
and calculating the worker capacity change rate corresponding to the repeated answer set based on the first variance and the second variance.
Wherein the repeated answer set is kept in the answer set by the following formula calculation
Figure BDA0001444018210000093
And then, the first variance of the consistency parameters of the answers of the workers:
Figure BDA0001444018210000094
wherein Var (P) is the first variance, E (P) is the average of the consistency parameters of each worker, PiTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000095
is a set of answers.
Calculating to delete a duplicate answer set in the answer set by the following formula
Figure BDA0001444018210000096
And a second variance of the consistency parameter of the answers of said workers:
Figure BDA0001444018210000097
Figure BDA0001444018210000098
wherein, Var(Pk) Is the second variance, Ε (P)k) Is the average of the consistency parameters, P, of each workeri kTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000101
for deleting
Figure BDA0001444018210000102
The latter answer set.
Calculating a set of repeated answers based on the following formula
Figure BDA0001444018210000103
Corresponding worker capacity change rate:
Figure BDA0001444018210000104
wherein the content of the first and second substances,
Figure BDA0001444018210000105
is the rate of change of worker capacity.
Step 204: and for repeated answer sets with the worker capacity change rate less than or equal to a preset threshold value, determining that the repeated answer sets are normally generated and reserving the repeated answer sets in the answer sets.
Step 205: for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is generated for collusion and deleting the repeated answer set in the answer set.
Step 206: and after the repeated answer sets are reserved or deleted, an updated answer set is obtained, and a convergence result of the updated answer set is calculated.
The collusion detection method provided by the embodiment of the invention can detect the collusion group with high precision according to the result given by a worker. Before and after a certain repeated answer set is deleted, the variation change of the consistency of the worker answers and results is used for formalizing the change rate of the human ability, and the collusion behavior is detected by using the scale of the change rate of the worker ability. The result processing method for deleting and then converging the collusion result provided by the embodiment of the invention can greatly improve the accuracy of the converging result. Different from the existing convergence algorithm, the result convergence method provided by the embodiment of the invention comprises detection of collusion behavior, can effectively eliminate the negative influence on result convergence, and improves the result quality.
Fig. 3 is a schematic structural composition diagram of a crowdsourcing result aggregation device based on collusion detection according to an embodiment of the present invention, and as shown in fig. 3, the device includes:
a collecting module 301, configured to collect answer sets of each worker for a task set from a crowdsourcing platform;
a consistency calculation module 302, configured to calculate a convergence result of the answer set, and calculate consistency parameters of the convergence result and the answers of each worker;
a worker capacity change rate module 303, configured to determine repeated answer sets from the answer sets, and calculate a worker capacity change rate corresponding to each repeated answer set based on a consistency parameter of answers of each worker;
a collusion detection module 304, configured to determine, for a repeated answer set in which a worker capability change rate is less than or equal to a preset threshold, that the repeated answer set is generated normally and retain the repeated answer set in the answer set; for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is collusion generation and deleting the repeated answer set in the answer set;
the aggregation module 305 is configured to obtain an updated answer set after performing retention or deletion processing on each repeated answer set, and calculate an aggregation result of the updated answer set.
In an embodiment of the present invention, the consistency calculating module 302 is specifically configured to calculate consistency parameters of the convergence result and the answers of the workers based on the following formulas:
Figure BDA0001444018210000111
wherein, PiFor the convergence of results and worker i and consistency parameters, LiThe answers returned for worker i corresponding to the task set,
Figure BDA0001444018210000112
is the aggregate result of the answer set.
In an embodiment of the present invention, as shown in fig. 4, the worker capability change rate module 303 includes:
a first variance calculating unit 3031, configured to calculate a first variance of the consistency parameter of the answers of each worker when a repeated answer set remains in the answer set;
a second variance calculating unit 3032, configured to calculate a second variance of the consistency parameter of the answers of each worker when the repeated answer set is deleted in the answer set;
a worker ability change rate calculation unit 3033, configured to calculate a worker ability change rate corresponding to the repeated answer set based on the first variance and the second variance.
In an embodiment of the present invention, the first variance calculating unit 3031 is specifically configured to calculate the remaining repeated answer sets in the answer set according to the following formula
Figure BDA0001444018210000113
And then, the first variance of the consistency parameters of the answers of the workers:
Figure BDA0001444018210000114
wherein Var (P) is the first variance, E (P) is the average of the consistency parameters of each worker, PiTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000115
is a set of answers.
In an embodiment of the present invention, the second variance calculating unit 3032 is specifically configured to calculate a repeated answer set to be deleted in the answer set according to the following formula
Figure BDA0001444018210000116
And a second variance of the consistency parameter of the answers of said workers:
Figure BDA0001444018210000121
wherein, Var (P)k) Is the second variance, Ε (P)k) Average value of consistency parameter for each worker,Pi kTo converge the results and consistency parameters for worker i and,
Figure BDA0001444018210000122
for deleting
Figure BDA0001444018210000123
The latter answer set.
In an embodiment of the present invention, the worker ability change rate calculation unit 3033 is specifically configured to calculate the repeated answer set based on the following formula
Figure BDA0001444018210000124
Corresponding worker capacity change rate:
Figure BDA0001444018210000125
wherein the content of the first and second substances,
Figure BDA0001444018210000126
is the rate of change of worker capacity.
It should be understood by those skilled in the art that the implementation functions of the modules in the collusion detection-based crowdsourcing result aggregation device shown in fig. 3 can be understood by referring to the related description of the aforementioned collusion detection-based crowdsourcing result aggregation method, and the implementation functions of the modules in the collusion detection-based crowdsourcing result aggregation device shown in fig. 3 can be implemented by a program running on a processor, and can also be implemented by a specific logic circuit.
The technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.
In the embodiments provided in the present invention, it should be understood that the disclosed method and intelligent device may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the apparatus according to the embodiment of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (8)

1. A method for crowdsourcing result aggregation based on collusion detection, the method comprising:
collecting answer sets of workers aiming at the task sets from the crowdsourcing platform;
calculating a convergence result of the answer set, and calculating a consistency parameter of the convergence result and the answers of each worker; wherein calculating consistency parameters of the aggregated results and the answers of each worker comprises: calculating a consistency parameter of the aggregated results and the answers of each worker based on the following formula:
Figure FDA0003107247540000011
wherein, PiAnswer L for aggregated results and worker iiOf the consistency parameter, LiThe answers returned for worker i corresponding to the task set,
Figure FDA0003107247540000012
is the aggregation result of the answer set;
determining repeated answer sets from the answer sets, and calculating the worker capacity change rate corresponding to each repeated answer set based on the consistency parameters of the answers of all workers; wherein the calculating a worker capability change rate corresponding to each repeated answer set based on the consistency parameter of the answers of the workers comprises: calculating a first variance of the consistency parameter of the answers of each worker when a set of repeated answers remains in the set of answers; calculating a second variance of the consistency parameter of the answers of each worker when a duplicate answer set is deleted in the answer set; calculating the worker capacity change rate corresponding to the repeated answer set based on the first variance and the second variance;
for a repeated answer set with a worker capacity change rate less than or equal to a preset threshold value, determining that the repeated answer set is normally generated and reserving the repeated answer set in the answer set;
for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is collusion generation and deleting the repeated answer set in the answer set;
and after the repeated answer sets are reserved or deleted, an updated answer set is obtained, and a convergence result of the updated answer set is calculated.
2. The method of claim 1, wherein the computing a first variance of the consistency parameter of the answers of the workers when a repeated answer set is retained in the answer set comprises:
keeping a repeated answer set in the answer set by calculating the following formula
Figure FDA0003107247540000013
And then, the first variance of the consistency parameters of the answers of the workers:
Figure FDA0003107247540000021
wherein Var (P) is the first variance, E (P) is the average of the consistency parameters of each worker, PiTo converge the results and consistency parameters for worker i and,
Figure FDA0003107247540000022
is a set of answers.
3. The crowd-sourced result aggregation method based on collusion detection according to claim 1 or 2, wherein the calculating a second variance of the consistency parameter of the answers of each worker when deleting the repeated answer set in the answer set comprises:
calculating to delete a duplicate answer set in the answer set by the following formula
Figure FDA00031072475400000211
And a second variance of the consistency parameter of the answers of said workers:
Figure FDA0003107247540000023
wherein, Var (P)k) Is the second variance, Ε (P)k) Is the average of the consistency parameters, P, of each workeri kTo converge the results and consistency parameters for worker i and,
Figure FDA0003107247540000024
for deleting
Figure FDA0003107247540000025
The latter answer set.
4. The crowd-sourced result aggregation method based on collusion detection according to claim 3, wherein the calculating a worker capability change rate corresponding to the repeated answer set based on the first variance and the second variance comprises:
calculating a set of repeated answers based on the following formula
Figure FDA0003107247540000026
Corresponding worker capacity change rate:
Figure FDA0003107247540000027
wherein the content of the first and second substances,
Figure FDA0003107247540000028
is the rate of change of worker capacity.
5. A crowd-sourced result aggregation device based on collusion detection, the device comprising:
the collection module is used for collecting answer sets of all workers aiming at the task sets from the crowdsourcing platform;
the consistency calculation module is used for calculating the convergence result of the answer set and calculating the consistency parameters of the convergence result and the answers of all workers; wherein calculating consistency parameters of the aggregated results and the answers of each worker comprises: calculating a consistency parameter of the aggregated results and the answers of each worker based on the following formula:
Figure FDA0003107247540000029
wherein, PiAnswer L for aggregated results and worker iiOf the consistency parameter, LiThe answers returned for worker i corresponding to the task set,
Figure FDA00031072475400000210
is the aggregation result of the answer set;
the worker capacity change rate module is used for determining repeated answer sets from the answer sets and calculating the worker capacity change rate corresponding to each repeated answer set based on the consistency parameters of the answers of all workers; wherein the worker capability change rate module comprises: a first variance calculating unit for calculating a first variance of a consistency parameter of the answers of each worker when a repeated answer set remains in the answer set; a second variance calculating unit for calculating a second variance of the consistency parameter of the answers of each worker when the repeated answer set is deleted in the answer set; the worker capacity change rate calculation unit is used for calculating the worker capacity change rate corresponding to the repeated answer set based on the first variance and the second variance;
the collusion detection module is used for determining that the repeated answer set is normally generated and reserving the repeated answer set in the answer set aiming at the repeated answer set with the worker capability change rate less than or equal to a preset threshold value; for a repeated answer set with a worker capacity change rate larger than a preset threshold value, determining that the repeated answer set is collusion generation and deleting the repeated answer set in the answer set;
and the aggregation module is used for obtaining an updated answer set after the repeated answer sets are reserved or deleted, and calculating an aggregation result of the updated answer set.
6. The crowd-sourced result aggregation device based on collusion detection according to claim 5, wherein the first variance calculating unit is specifically configured to calculate to keep a repeated answer set in the answer set by the following formula
Figure FDA0003107247540000031
And then, the first variance of the consistency parameters of the answers of the workers:
Figure FDA0003107247540000032
wherein Var (P) is the first variance, E (P) is the average of the consistency parameters of each worker, PiTo converge the results and consistency parameters for worker i and,
Figure FDA0003107247540000033
is a set of answers.
7. Collusion detection based on claim 5 or 6The crowdsourcing result aggregation device is characterized in that the second variance calculation unit is specifically configured to calculate a repeated answer set deleted from the answer set by the following formula
Figure FDA0003107247540000034
And a second variance of the consistency parameter of the answers of said workers:
Figure FDA0003107247540000035
wherein, Var (P)k) Is the second variance, Ε (P)k) Is the average of the consistency parameters, P, of each workeri kTo converge the results and consistency parameters for worker i and,
Figure FDA0003107247540000036
for deleting
Figure FDA0003107247540000037
The latter answer set.
8. The crowd-sourced result aggregation device based on collusion detection according to claim 7, wherein the worker capability change rate calculation unit is specifically configured to calculate the repeated answer set based on the following formula
Figure FDA0003107247540000041
Corresponding worker capacity change rate:
Figure FDA0003107247540000042
wherein the content of the first and second substances,
Figure FDA0003107247540000043
is the rate of change of worker capacity.
CN201711003779.2A 2017-10-24 2017-10-24 Crowdsourcing result aggregation method and device based on collusion detection Active CN107767055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711003779.2A CN107767055B (en) 2017-10-24 2017-10-24 Crowdsourcing result aggregation method and device based on collusion detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711003779.2A CN107767055B (en) 2017-10-24 2017-10-24 Crowdsourcing result aggregation method and device based on collusion detection

Publications (2)

Publication Number Publication Date
CN107767055A CN107767055A (en) 2018-03-06
CN107767055B true CN107767055B (en) 2021-07-23

Family

ID=61270213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711003779.2A Active CN107767055B (en) 2017-10-24 2017-10-24 Crowdsourcing result aggregation method and device based on collusion detection

Country Status (1)

Country Link
CN (1) CN107767055B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2743898C1 (en) 2018-11-16 2021-03-01 Общество С Ограниченной Ответственностью "Яндекс" Method for performing tasks
CN109978333B (en) * 2019-02-26 2023-04-07 湖南大学 Independent worker selection method based on community discovery and link prediction in crowdsourcing system
RU2744032C2 (en) 2019-04-15 2021-03-02 Общество С Ограниченной Ответственностью "Яндекс" Method and system for determining result of task execution in crowdsourced environment
RU2744038C2 (en) 2019-05-27 2021-03-02 Общество С Ограниченной Ответственностью «Яндекс» Method and a system for determining the result of a task in the crowdsourcing environment
RU2019128272A (en) 2019-09-09 2021-03-09 Общество С Ограниченной Ответственностью «Яндекс» Method and System for Determining User Performance in a Computer Crowdsourced Environment
RU2019135532A (en) 2019-11-05 2021-05-05 Общество С Ограниченной Ответственностью «Яндекс» Method and system for selecting a label from a plurality of labels for a task in a crowdsourced environment
CN110930114B (en) * 2019-11-20 2022-08-23 北京航空航天大学 Crowdsourcing method for resisting collusion
CN111292062B (en) * 2020-02-10 2023-04-25 中南大学 Network embedding-based crowd-sourced garbage worker detection method, system and storage medium
RU2020107002A (en) 2020-02-14 2021-08-16 Общество С Ограниченной Ответственностью «Яндекс» METHOD AND SYSTEM FOR RECEIVING A LABEL FOR A DIGITAL PROBLEM PERFORMED IN A CROWDSORING ENVIRONMENT

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133769A (en) * 2014-08-02 2014-11-05 哈尔滨理工大学 Crowdsourcing fraud detection method based on psychological behavior analysis
CN104599084A (en) * 2015-02-12 2015-05-06 北京航空航天大学 Crowd calculation quality control method and device
CN107273492A (en) * 2017-06-15 2017-10-20 复旦大学 A kind of exchange method based on mass-rent platform processes image labeling task

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10063593B2 (en) * 2015-12-29 2018-08-28 International Business Machines Corporation Propagating fraud awareness to hosted applications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133769A (en) * 2014-08-02 2014-11-05 哈尔滨理工大学 Crowdsourcing fraud detection method based on psychological behavior analysis
CN104599084A (en) * 2015-02-12 2015-05-06 北京航空航天大学 Crowd calculation quality control method and device
CN107273492A (en) * 2017-06-15 2017-10-20 复旦大学 A kind of exchange method based on mass-rent platform processes image labeling task

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Algorithmic Mechanisms for Reliable Crowdsourcing Computation under Collusion";Antonio Fernández Anta 等;《PLOS ONE》;20150320;全文 *

Also Published As

Publication number Publication date
CN107767055A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN107767055B (en) Crowdsourcing result aggregation method and device based on collusion detection
JP6594329B2 (en) System and method for facial expression
US9619694B2 (en) Facial recognition using social networking information
US9485204B2 (en) Reducing photo-tagging spam
CN104809132B (en) A kind of method and device obtaining network principal social networks type
CN104657133B (en) A kind of motivational techniques for single-time-window task in mobile intelligent perception
CN111737481B (en) Method, device, equipment and storage medium for noise reduction of knowledge graph
Wu et al. Probabilistic undirected graph based denoising method for dynamic vision sensor
WO2021232963A1 (en) Video noise-reduction method and apparatus, and mobile terminal and storage medium
CN111898592B (en) Track data processing method and device and computer readable storage medium
CN110033424A (en) Method, apparatus, electronic equipment and the computer readable storage medium of image procossing
WO2023173646A1 (en) Expression recognition method and apparatus
US10791321B2 (en) Constructing a user's face model using particle filters
CN113688839B (en) Video processing method and device, electronic equipment and computer readable storage medium
JP2019020882A (en) Life log utilization system, method and program
KR101898648B1 (en) Apparatus and Method for Detecting Interacting Groups Between Individuals in an Image
WO2021212760A1 (en) Method and apparatus for determining identity type of person, and electronic system
CN110136019B (en) Social media abnormal group user detection method based on relational evolution
CN102955947B (en) A kind of device and method thereof for being used to determine image definition
Suarez et al. AFAR: a real-time vision-based activity monitoring and fall detection framework using 1D convolutional neural networks
CN111723338A (en) Detection method and detection equipment
CN113326829B (en) Method and device for recognizing gesture in video, readable storage medium and electronic equipment
CN115311723A (en) Living body detection method, living body detection device and computer-readable storage medium
CN114461078A (en) Man-machine interaction method based on artificial intelligence
CN111461971A (en) Image processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant