CN107767055A

CN107767055A - A kind of mass-rent result assemblage method and device based on collusion detection

Info

Publication number: CN107767055A
Application number: CN201711003779.2A
Authority: CN
Inventors: 孙海龙; 王旭; 陈鹏鹏; 方毅立
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-10-24
Filing date: 2017-10-24
Publication date: 2018-03-06
Anticipated expiration: 2037-10-24
Also published as: CN107767055B

Abstract

The invention discloses a crowdsourcing result aggregation method and device based on collusion detection. The method includes: collecting the answer sets of each worker for the task set from the crowdsourcing platform; calculating the aggregation result of the answer set, and calculating the Aggregating the result and the consistency parameter of each worker's answer; determining the repeated answer set from the answer set, and calculating the worker ability change rate corresponding to each repeated answer set based on the consistency parameter of the answer of each worker; for A repeated answer set whose rate of change of worker ability is less than or equal to a preset threshold is determined to be normally generated and the repeated answer set is retained in the answer set; for a repeated answer set whose rate of change of worker ability is greater than a preset threshold , determining that the repeated answer set is generated by collusion and deleting the repeated answer set from the answer set; obtaining an updated answer set, and calculating an aggregation result of the updated answer set.

Description

A crowdsourcing result aggregation method and device based on collusion detection

技术领域technical field

本发明涉及众包技术领域，尤其涉及一种基于串谋检测的众包结果汇聚方法及装置。The present invention relates to the technical field of crowdsourcing, in particular to a method and device for converging crowdsourcing results based on collusion detection.

背景技术Background technique

众包是一个快速发展的领域，旨在利用人的认知优势来解决计算机难以解决的问题。众包通用平台如，CrowdFlower和AMT，被人们广泛应用于一般的数据处理任务，例如情感分析，手写识别和图片标注。由于工人可能会返回低质量的结果，众包的一个核心问题是保证结果质量。广泛采用的控制质量的方法是结果汇聚，它首先将每个任务分配给多个工人，然后使用推理算法来汇聚工人返回的结果。以图像标注为例，一个图像被分配给多个工人，然后这些工人分别提供描述图像内容的标签。最后，通过投票或其他推理方法从所有收集的标签中汇聚出一个高质量的结果。Crowdsourcing is a rapidly developing field that aims to use human cognitive advantages to solve problems that are difficult for computers. Crowdsourcing general platforms such as CrowdFlower and AMT are widely used for general data processing tasks such as sentiment analysis, handwriting recognition and image annotation. Since workers may return low-quality results, a central issue in crowdsourcing is to guarantee the quality of results. A widely adopted method to control quality is result pooling, which first assigns each task to multiple workers and then uses inference algorithms to pool the results returned by the workers. Taking image labeling as an example, an image is assigned to multiple workers, and then these workers each provide labels describing the content of the image. Finally, a high-quality result is pooled from all collected labels by voting or other inference methods.

在众包中，为了获取更多的报酬付出更少的劳动力，串谋者在平台外通过短信，微信，电话，论坛甚至面对面的交流，形成串谋队伍。在一个串谋队伍中，只有一个工人处理任务，其他工人抄袭他的答案。最终队伍中的所有工人均提供相同的答案。这些恶意的重复的答案在结果汇聚中将会主导正常工人提供的答案，降低结果的质量。例如，一个任务交给五个工人执行，如果其中三个工人串谋，则利用大多数投票法进行结果汇聚，最终的汇聚结果将等同于串谋者提供的结果。In crowdsourcing, in order to get more rewards and pay less labor, conspirators form collusion teams outside the platform through SMS, WeChat, phone calls, forums and even face-to-face communication. In a colluding team, only one worker tackles a task and the other workers copy his answers. All workers in the final team provide the same answer. These malicious repeated answers will dominate the answers provided by normal workers in the result aggregation, reducing the quality of the results. For example, if a task is assigned to five workers, if three of them collude, the results will be aggregated using the majority voting method, and the final pooled result will be equal to the result provided by the colluders.

由以上可知，串谋产生的重复答案对通用平台上的一般任务的结果质量是有害的。然而现有的串谋探测算法并不能有效地探测并消除此类串谋的负面影响。From the above, it can be seen that duplicate answers produced by collusion are detrimental to the result quality of general tasks on general platforms. However, existing collusion detection algorithms cannot effectively detect and eliminate the negative impact of such collusion.

发明内容Contents of the invention

为解决上述技术问题，本发明实施例提供了一种基于串谋检测的众包结果汇聚方法及装置。In order to solve the above technical problems, embodiments of the present invention provide a method and device for gathering crowdsourcing results based on collusion detection.

本发明实施例提供的基于串谋检测的众包结果汇聚方法，包括：The crowdsourcing result aggregation method based on collusion detection provided by the embodiment of the present invention includes:

从众包平台收集各工人针对任务集合的答案集合；Collect the answer sets of each worker for the set of tasks from the crowdsourcing platform;

计算所述答案集合的汇聚结果，并计算所述汇聚结果和各工人的答案的一致性参数；calculating an aggregated result of the set of answers, and calculating a consistency parameter between the aggregated result and each worker's answer;

从所述答案集合中确定出重复答案集合，基于所述各工人的答案的一致性参数，计算每个重复答案集合对应的工人能力变化率；Determining a repeated answer set from the answer set, and calculating a worker capability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of the workers;

针对工人能力变化率小于等于预设阈值的重复答案集合，确定所述重复答案集合为正常产生并在所述答案集合中保留所述重复答案集合；For a repeated answer set whose worker ability change rate is less than or equal to a preset threshold, determine that the repeated answer set is normally generated and retain the repeated answer set in the answer set;

针对工人能力变化率大于预设阈值的重复答案集合，确定所述重复答案集合为串谋产生并在所述答案集合中删除所述重复答案集合；For repeated answer sets whose rate of change of worker ability is greater than a preset threshold, determine that the repeated answer sets are produced by collusion and delete the repeated answer sets from the answer sets;

对各重复答案集合进行保留或删除处理后，得到更新的答案集合，并计算所述更新的答案集合的汇聚结果。After retaining or deleting each repeated answer set, an updated answer set is obtained, and an aggregation result of the updated answer set is calculated.

本发明实施例中，所述计算所述汇聚结果和各工人的答案的一致性参数，包括：In the embodiment of the present invention, the calculation of the consistency parameter of the aggregation result and the answers of each worker includes:

基于以下公式计算所述汇聚结果和各工人的答案的一致性参数：The consensus parameter for the aggregated results and the answers of each worker is calculated based on the following formula:

其中，P_i为汇聚结果和工人i与的一致性参数，L_i为工人i对应于任务集合返回的答案，为答案集合的汇聚结果。Among them, P _i is the aggregation result and the consistency parameter of worker i and L _i is the answer returned by worker i corresponding to the task set, Aggregated results for answer sets.

本发明实施例中，所述基于所述各工人的答案的一致性参数，计算每个重复答案集合对应的工人能力变化率，包括：In the embodiment of the present invention, the calculation of the worker ability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of each worker includes:

计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差；calculating the first variance of the consistency parameter of the answers of the workers while retaining duplicate answer sets in the answer sets;

计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差；calculating the second variance of the consistency parameter of the answers of the workers when the repeated answer sets are deleted in the answer sets;

基于所述第一方差和所述第二方差，计算所述重复答案集合对应的工人能力变化率。Based on the first variance and the second variance, calculate the worker capability change rate corresponding to the repeated answer set.

本发明实施例中，所述计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差，包括：In the embodiment of the present invention, when the calculation retains repeated answer sets in the answer set, the first variance of the consistency parameter of the answers of the workers includes:

通过以下公式计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差：Keep duplicate answer sets in the answer set by the following formula , the first variance of the consistency parameter of each worker's answer:

其中，Var(P)为第一方差，E(P)为各工人的一致性参数的平均值，P_i为汇聚结果和工人i与的一致性参数，为答案集合。Among them, Var(P) is the first variance, E(P) is the average value of the consistency parameters of each worker, P _i is the aggregation result and the consistency parameter of worker i and collection of answers.

本发明实施例中，所述计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差，包括：In the embodiment of the present invention, when the calculation deletes duplicate answer sets in the answer set, the second variance of the consistency parameter of the answers of the workers includes:

通过以下公式计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差：Delete duplicate answer sets in the answer set by the following formula , the second variance of the consistency parameter of each worker's answer:

其中，Var(P^k)为第二方差，Ε(P^k)为各工人的一致性参数的平均值，P_i ^k为汇聚结果和工人i与的一致性参数，为删除后的答案集合。Among them, Var(P ^k ) is the second variance, Ε(P ^k ) is the average value of the consistency parameters of each worker, P _i ^k is the aggregation result and the consistency parameter of worker i and to delete collection of answers.

本发明实施例中，所述基于所述第一方差和所述第二方差，计算所述重复答案集合对应的工人能力变化率，包括：In the embodiment of the present invention, the calculation of the rate of change of worker ability corresponding to the repeated answer set based on the first variance and the second variance includes:

基于以下公式计算重复答案集合对应的工人能力变化率：Calculate duplicate answer sets based on the following formula Corresponding rate of change in worker capacity:

其中，为工人能力变化率。in, is the rate of change in worker capacity.

本发明实施例提供的基于串谋检测的众包结果汇聚装置，包括：The crowdsourcing result aggregation device based on collusion detection provided by the embodiment of the present invention includes:

收集模块，用于从众包平台收集各工人针对任务集合的答案集合；The collection module is used to collect the answer sets of each worker for the set of tasks from the crowdsourcing platform;

一致性计算模块，用于计算所述答案集合的汇聚结果，并计算所述汇聚结果和各工人的答案的一致性参数；a consistency calculation module, configured to calculate the aggregation result of the answer set, and calculate the consistency parameter between the aggregation result and the answers of each worker;

工人能力变化率模块，用于从所述答案集合中确定出重复答案集合，基于所述各工人的答案的一致性参数，计算每个重复答案集合对应的工人能力变化率；A worker ability change rate module, configured to determine a repeated answer set from the answer set, and calculate the worker ability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of each worker;

串谋检测模块，用于针对工人能力变化率小于等于预设阈值的重复答案集合，确定所述重复答案集合为正常产生并在所述答案集合中保留所述重复答案集合；针对工人能力变化率大于预设阈值的重复答案集合，确定所述重复答案集合为串谋产生并在所述答案集合中删除所述重复答案集合；The collusion detection module is used to determine that the repeated answer set is normally generated and retain the repeated answer set in the answer set for the repeated answer set whose worker ability change rate is less than or equal to the preset threshold; for the worker ability change rate A duplicate answer set greater than a preset threshold, determining that the duplicate answer set is generated by collusion and deleting the duplicate answer set from the answer set;

汇聚模块，用于对各重复答案集合进行保留或删除处理后，得到更新的答案集合，并计算所述更新的答案集合的汇聚结果。The aggregation module is configured to obtain an updated answer set after retaining or deleting each repeated answer set, and calculate an aggregation result of the updated answer set.

本发明实施例中，所述一致性计算模块，具体用于基于以下公式计算所述汇聚结果和各工人的答案的一致性参数：In the embodiment of the present invention, the consistency calculation module is specifically used to calculate the consistency parameters of the aggregation result and each worker's answer based on the following formula:

本发明实施例中，所述工人能力变化率模块包括：In the embodiment of the present invention, the worker ability change rate module includes:

第一方差计算单元，用于计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差；a first variance calculation unit, configured to calculate the first variance of the consistency parameter of the answers of the workers when the repeated answer set is reserved in the answer set;

第二方差计算单元，用于计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差；The second variance calculation unit is used to calculate the second variance of the consistency parameter of the answers of the workers when the duplicate answer set is deleted from the answer set;

工人能力变化率计算单元，用于基于所述第一方差和所述第二方差，计算所述重复答案集合对应的工人能力变化率。A worker capability change rate calculation unit, configured to calculate the worker capability change rate corresponding to the repeated answer set based on the first variance and the second variance.

本发明实施例中，所述第一方差计算单元，具体用于通过以下公式计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差：In the embodiment of the present invention, the first variance calculation unit is specifically used to calculate and retain repeated answer sets in the answer set by the following formula: , the first variance of the consistency parameter of each worker's answer:

本发明实施例中，所述第二方差计算单元，具体用于通过以下公式计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差：In the embodiment of the present invention, the second variance calculation unit is specifically used to calculate and delete duplicate answer sets in the answer set by the following formula: , the second variance of the consistency parameter of each worker's answer:

本发明实施例中，所述工人能力变化率计算单元，具体用于基于以下公式计算重复答案集合对应的工人能力变化率：In the embodiment of the present invention, the worker ability change rate calculation unit is specifically used to calculate repeated answer sets based on the following formula Corresponding rate of change in worker capacity:

采用本发明实施例的技术方案，(1)不同于时空众包以及社交网络的场景，通用平台中，一般任务的答案的特征是未知的。因此，本发明实施例引入工人答案和汇聚结果一致性的概念来描述串谋产生的重复答案对结果汇聚的影响。Adopting the technical solutions of the embodiments of the present invention, (1) Different from spatio-temporal crowdsourcing and social network scenarios, in general platforms, the characteristics of answers to general tasks are unknown. Therefore, the embodiment of the present invention introduces the concept of consistency between workers' answers and aggregation results to describe the influence of repeated answers produced by collusion on result aggregation.

(2)不同于电子商务平台中基于相似度的串谋检测算法，本发明实施例提出一种基于工人表现变化率的串谋检测方法，能够在包含正常重复答案的答案集合中判定出串谋产生的重复答案。(3)本发明实施例提出一种串谋检测的众包结果汇聚方法，可以有效地消除串谋行为对结果汇聚的负面影响。(2) Different from the collusion detection algorithm based on similarity in the e-commerce platform, the embodiment of the present invention proposes a collusion detection method based on the change rate of worker performance, which can determine the collusion in the answer set containing normal repeated answers Duplicate answers generated. (3) The embodiment of the present invention proposes a crowdsourcing result aggregation method for collusion detection, which can effectively eliminate the negative impact of collusion behavior on result aggregation.

附图说明Description of drawings

图1为本发明实施例的基于串谋检测的众包框架示意图；FIG. 1 is a schematic diagram of a crowdsourcing framework based on collusion detection according to an embodiment of the present invention;

图2为本发明实施例的基于串谋检测的众包结果汇聚方法的流程示意图；FIG. 2 is a schematic flow diagram of a crowdsourcing result aggregation method based on collusion detection according to an embodiment of the present invention;

图3为本发明实施例的基于串谋检测的众包结果汇聚装置的结构组成示意图；3 is a schematic diagram of the structural composition of a crowdsourcing result aggregation device based on collusion detection according to an embodiment of the present invention;

图4为本发明实施例的工人能力变化率模块的结果示意图。Fig. 4 is a schematic diagram of the result of the worker ability change rate module according to the embodiment of the present invention.

具体实施方式Detailed ways

为了能够更加详尽地了解本发明实施例的特点与技术内容，下面结合附图对本发明实施例的实现进行详细阐述，所附附图仅供参考说明之用，并非用来限定本发明实施例。In order to understand the characteristics and technical contents of the embodiments of the present invention in more detail, the implementation of the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the embodiments of the present invention.

现有的串谋探测算法并不能有效地探测并消除串谋的负面影响，主要有以下几点原因：Existing collusion detection algorithms cannot effectively detect and eliminate the negative impact of collusion, mainly for the following reasons:

(1)时空众包和社交网络中的串谋检测算法需要抽取数据的一些特征对串谋进行检测，如在时空众包中，利用采集数据的空间和时间特征对串谋进行检测。然而这些特征在通用的众包平台上是难以获取的。(1) Collusion detection algorithms in spatio-temporal crowdsourcing and social networks need to extract some features of the data to detect collusion. However, these features are difficult to obtain on general-purpose crowdsourcing platforms.

(2)电子商务平台中的检测算法主要是基于每对工人提供答案之间的相似度对串谋进行检测。由于通用平台中任务的重复答案分为正常重复和串谋重复。在一些简单任务中工人表现出较高的能力，此时重复答案中有很多是正常产生的。基于答案的相似度对串谋进行检验会把正常重复的答案错判为串谋产生的答案。(2) The detection algorithm in the e-commerce platform is mainly based on the similarity between the answers provided by each pair of workers to detect collusion. Due to the repeated answers of tasks in the general platform are divided into normal repetitions and collusion repetitions. Workers show high ability in some simple tasks, when many of the repeated answers are normally produced. Testing for collusion based on the similarity of answers can misjudge normally repeated answers as those produced by collusion.

(3)在拍卖平台中，竞拍者往往会串谋，以付出低成本获取高额回报。此类算法主要基于博弈论对串谋行为进行检测，难以适用于通用平台上的一般任务。(3) In auction platforms, bidders often collude to obtain high returns at low costs. Such algorithms are mainly based on game theory to detect collusion, which is difficult to apply to general tasks on general platforms.

综上所述，对于通用平台的一般任务，现有的算法并不能有效地探测并消除串谋产生重复答案对结果质量的危害。针对存在的问题，本发明实施例的技术方案提出基于串谋检测的众包质量控制方法。To sum up, for general tasks on common platforms, existing algorithms cannot effectively detect and eliminate the harm of collusion to generate duplicate answers to the quality of results. In view of the existing problems, the technical solution of the embodiment of the present invention proposes a crowdsourcing quality control method based on collusion detection.

图1为本发明实施例的基于串谋检测的众包框架示意图，如图1所示，该框架包括以下几步：Fig. 1 is a schematic diagram of a crowdsourcing framework based on collusion detection according to an embodiment of the present invention. As shown in Fig. 1, the framework includes the following steps:

(1)请求者将任务发布到众包平台，例如MechanicalTurk，其中请求者根据工人的答案的质量给予相应的奖励。(1) Requesters post tasks to crowdsourcing platforms, such as MechanicalTurk, where requesters reward workers according to the quality of their answers.

(2)任务根据调度策略和用户指定的平台约束分配给工人。(2) Tasks are assigned to workers according to the scheduling policy and user-specified platform constraints.

(3)实际上，一些工人并非独立的，甚至可以在平台之外协同处理一些众包任务。工人可能会在幕后相互勾结。例如，工人通过在线论坛对同样的众包工作的其他人进行剽窃。在任务处理之后，收集答案并消除一些嘈杂的答案，例如一些答案显然与图像标签任务中的图片无关。(3) In fact, some workers are not independent and can even collaborate on some crowdsourcing tasks outside the platform. Workers may collude with each other behind the scenes. For example, workers plagiarize others who are crowdsourcing the same work through online forums. After the task processing, answers are collected and some noisy answers are eliminated, e.g. some answers are obviously irrelevant to the pictures in the image labeling task.

(4)此步骤涉及串谋检测和结果汇聚。完成收集工人的所有工人返回答案后，本发明实施例采用串谋检测机制来检测串谋行为，然后过滤掉由串谋者产生的重复答案。在结果过滤之后，本发明实施例使用汇聚方法来推理每个任务的最终结果并将其提交给请求者。(4) This step involves collusion detection and aggregation of results. After completing the collection of answers from all the workers, the embodiment of the present invention uses a collusion detection mechanism to detect collusion, and then filters out duplicate answers generated by colluders. After result filtering, embodiments of the present invention use an aggregation method to reason about the final result of each task and submit it to the requester.

本发明实施例的框架的核心是第(4)步，其包含本发明实施例提出的串谋检测方法，然后采用结果推理方法，即使在串谋的情况下也可以推理高质量的结果。The core of the framework of the embodiment of the present invention is step (4), which includes the collusion detection method proposed by the embodiment of the present invention, and then adopts the result reasoning method, which can reason high-quality results even in the case of collusion.

本发明实施例提出的串谋检测众包框架，有效地解决现有的结果汇聚算法难以有效消除串谋对结果汇聚的危害的问题。和一般的众包框架不同的是，本发明实施例提出的众包框架中的工人不再是独立的，而是相互之间可能有沟通甚至串谋的。此外框架中的结果推理部分包含串口检测的过程。The crowdsourcing framework for collusion detection proposed by the embodiments of the present invention effectively solves the problem that it is difficult for existing result aggregation algorithms to effectively eliminate the harm of collusion to result aggregation. Different from the general crowdsourcing framework, the workers in the crowdsourcing framework proposed by the embodiment of the present invention are no longer independent, but may communicate or even collude with each other. In addition, the result reasoning part of the framework includes the process of serial port detection.

本发明实施例的技术方案整体包括：串谋检测、结果过滤、结果汇聚三大步骤，以下对这三大步骤进行描述。The technical solutions of the embodiments of the present invention generally include three steps: collusion detection, result filtering, and result aggregation, which are described below.

步骤一：串谋检测Step 1: Collusion Detection

(1)计算汇聚结果和工人答案的一致性：当工人完成任务处理时，首先对工人返回的答案进行收集，假定对于任务集合工人返回的答案集合为设为答案集合中的一个重复答案集合。本发明实施例的目的是判断重复集合是否是串谋产生的，并在此基础上对答案集合进行汇聚以获得高质量的结果。(1) Calculate the consistency between the aggregation results and the worker's answer: when the worker completes the task processing, first collect the answer returned by the worker, assuming that for the task set The set of answers returned by workers is Assume A repeated answer set for one of the answer sets. The purpose of the embodiment of the present invention is to judge whether the repeated set is produced by collusion, and on this basis answer set Pooling is performed to obtain high-quality results.

利用大多数投票法对答案集合进行汇聚得到汇聚结果本发明实施例给出汇聚结果和工人i答案一致性的计算公式：Aggregate the answer set using the majority voting method to obtain the aggregated result The embodiment of the present invention provides a calculation formula for the consistency between the aggregation result and worker i's answer:

其中，L_i为工人i对应于任务集合返回的答案集合。Among them, L _i is worker i corresponding to task set The collection of answers returned.

(2)计算对应每个重复答案集合的工人能力变化率：对于一个重复答案集合，工人能力变化率主要衡量重复答案集合对工人答案和汇聚结果一致性的整体表现。本发明实施例利用删除重复答案集合前后整体一致性的方差变化来形式化工人能力变化率。首先，计算保留重复答案集合时，工人答案一致性的方差：(2) Calculate the change rate of worker ability corresponding to each repeated answer set: For a repeated answer set, the change rate of worker ability mainly measures the overall performance of the repeated answer set on the consistency of worker answers and aggregation results. In the embodiment of the present invention, the variance change of the overall consistency before and after deleting duplicate answer sets is used to formalize the change rate of the worker's ability. First, calculate the set of retained duplicate answers When , the variance of worker answer consistency:

删除重复答案集合可以获得类似的，计算删除重复答案集合时工人答案一致性的方差：Remove Duplicate Answer Sets available Similarly, the calculation removes duplicate answer sets The variance of worker answer agreement when :

然后上述两个方式的公式，得到工人能力变化率：Then the above two formulas can be used to obtain the rate of change in worker capacity:

(3)判断重复答案是否由串谋产生的：当小于等于阈值Threshold时，则认为重复集合为正常重复答案。当大于Threshold时，则认为重复集合为串谋重复答案。(3) Judging whether repeated answers are produced by collusion: when When it is less than or equal to the threshold Threshold, it is considered a duplicate set Repeat the answer as normal. when When it is greater than Threshold, it is considered to be a duplicate set Repeat answer for collusion.

上述方案中，在计算重复答案集合的工人答案一致性的方差时，也可以利用其它汇聚算法的结果得到例如概率的汇聚方法。In the above scheme, when calculating the repeated answer set When the variance of the worker's answer consistency, it can also be obtained by using the results of other pooling algorithms For example, pooling methods for probabilities.

步骤二：结果过滤Step 2: Result filtering

重复以上步骤一对答案集合中的所有重复集合进行检测。被判定为串谋产生的重复答案将被删除，被判定为正常重复的答案将被保留。Repeat the above steps for a pair of answer sets All sets of duplicates in . Duplicate answers judged to be collusion will be deleted, and answers judged to be normal duplicates will be kept.

步骤三：结果汇聚Step 3: Aggregation of Results

利用已有结果汇聚算法对答案集合进行汇聚出最终结果。Use the existing result aggregation algorithm to aggregate the answer set to get the final result.

图2为本发明实施例的基于串谋检测的众包结果汇聚方法的流程示意图，如图2所示，所述基于串谋检测的众包结果汇聚方法包括以下步骤：Fig. 2 is a schematic flowchart of a method for converging crowdsourcing results based on collusion detection according to an embodiment of the present invention. As shown in Fig. 2 , the method for converging crowdsourcing results based on collusion detection includes the following steps:

步骤201：从众包平台收集各工人针对任务集合的答案集合。Step 201: Collect the answer sets of each worker for the set of tasks from the crowdsourcing platform.

步骤202：计算所述答案集合的汇聚结果，并计算所述汇聚结果和各工人的答案的一致性参数。Step 202: Calculate the aggregation result of the answer set, and calculate the consistency parameter between the aggregation result and the answers of each worker.

本发明实施例中，基于以下公式计算所述汇聚结果和各工人的答案的一致性参数：In the embodiment of the present invention, the consistency parameter between the aggregation result and each worker's answer is calculated based on the following formula:

步骤203：从所述答案集合中确定出重复答案集合，基于所述各工人的答案的一致性参数，计算每个重复答案集合对应的工人能力变化率。Step 203: Determine repeated answer sets from the answer sets, and calculate the worker capability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of the workers.

其中，通过以下公式计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差：Wherein, the duplicate answer set is retained in the answer set by the following formula , the first variance of the consistency parameter of each worker's answer:

其中，V_ar(P^k)为第二方差，Ε(P^k)为各工人的一致性参数的平均值，P_i ^k为汇聚结果和工人i与的一致性参数，为删除后的答案集合。Among them, _Var (P ^k ) is the second variance, Ε (P ^k ) is the average value of the consistency parameters of each worker, P _i ^k is the aggregation result and the consistency parameter of worker i and to delete collection of answers.

步骤204：针对工人能力变化率小于等于预设阈值的重复答案集合，确定所述重复答案集合为正常产生并在所述答案集合中保留所述重复答案集合。Step 204: For the repetitive answer set whose worker capability change rate is less than or equal to the preset threshold, determine that the repeated answer set is normally generated and keep the repeated answer set in the answer set.

步骤205：针对工人能力变化率大于预设阈值的重复答案集合，确定所述重复答案集合为串谋产生并在所述答案集合中删除所述重复答案集合。Step 205: For the repeated answer sets whose rate of change of worker ability is greater than a preset threshold, determine that the repeated answer sets are generated by collusion and delete the repeated answer sets from the answer sets.

步骤206：对各重复答案集合进行保留或删除处理后，得到更新的答案集合，并计算所述更新的答案集合的汇聚结果。Step 206: After retaining or deleting each repeated answer set, an updated answer set is obtained, and an aggregation result of the updated answer set is calculated.

本发明实施例提出的串谋检测方法，根据工人给出的结果可以高精度地探测出串谋团体。利用删除某个重复答案集前后，工人答案和结果一致性的方差变化来形式化工人能力变化率，利用工人能力变化率变化的规模来探测串谋行为。本发明实施例提出的对串谋结果删除后再汇聚的结果处理方式，可以极大地提高汇聚结果的准确率。与现有汇聚算法不同，本发明实施例提出的结果汇聚方法包含串谋行为的检测并且能够有效地消除其对结果汇聚的负面影响，提高结果质量。The collusion detection method proposed by the embodiment of the present invention can detect the collusion group with high precision according to the results given by the workers. The change in the variance of the consistency of workers' answers and results before and after deleting a duplicate answer set is used to formalize the change rate of worker ability, and the scale of the change rate of worker ability is used to detect collusion. The collusion results proposed in the embodiment of the present invention are deleted and then aggregated, which can greatly improve the accuracy of the aggregated results. Different from the existing aggregation algorithm, the result aggregation method proposed by the embodiment of the present invention includes the detection of collusion behavior and can effectively eliminate its negative impact on the result aggregation and improve the quality of the result.

图3为本发明实施例的基于串谋检测的众包结果汇聚装置的结构组成示意图，如图3所示，所述装置包括：FIG. 3 is a schematic diagram of the structure and composition of a crowdsourcing result aggregation device based on collusion detection according to an embodiment of the present invention. As shown in FIG. 3 , the device includes:

收集模块301，用于从众包平台收集各工人针对任务集合的答案集合；The collection module 301 is used to collect the answer sets of each worker for the set of tasks from the crowdsourcing platform;

一致性计算模块302，用于计算所述答案集合的汇聚结果，并计算所述汇聚结果和各工人的答案的一致性参数；Consistency calculation module 302, configured to calculate the aggregation result of the answer set, and calculate the consistency parameter between the aggregation result and the answers of each worker;

工人能力变化率模块303，用于从所述答案集合中确定出重复答案集合，基于所述各工人的答案的一致性参数，计算每个重复答案集合对应的工人能力变化率；The worker ability change rate module 303 is used to determine the repeated answer set from the answer set, and calculate the worker ability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of the workers;

串谋检测模块304，用于针对工人能力变化率小于等于预设阈值的重复答案集合，确定所述重复答案集合为正常产生并在所述答案集合中保留所述重复答案集合；针对工人能力变化率大于预设阈值的重复答案集合，确定所述重复答案集合为串谋产生并在所述答案集合中删除所述重复答案集合；The collusion detection module 304 is used to determine that the repeated answer set is normally generated and retain the repeated answer set in the answer set for the repeated answer set whose worker ability change rate is less than or equal to the preset threshold; rate is greater than a preset threshold, determine that the repeated answer set is collusively generated and delete the repeated answer set from the answer set;

汇聚模块305，用于对各重复答案集合进行保留或删除处理后，得到更新的答案集合，并计算所述更新的答案集合的汇聚结果。The aggregation module 305 is configured to obtain an updated answer set after retaining or deleting each repeated answer set, and calculate an aggregation result of the updated answer set.

本发明一实施方式中，所述一致性计算模块302，具体用于基于以下公式计算所述汇聚结果和各工人的答案的一致性参数：In one embodiment of the present invention, the consistency calculation module 302 is specifically used to calculate the consistency parameters of the aggregation result and the answers of each worker based on the following formula:

本发明一实施方式中，如图4所示，所述工人能力变化率模块303包括：In one embodiment of the present invention, as shown in Figure 4, the worker ability change rate module 303 includes:

第一方差计算单元3031，用于计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差；The first variance calculation unit 3031 is used to calculate the first variance of the consistency parameter of the answers of the workers when the repeated answer sets are reserved in the answer sets;

第二方差计算单元3032，用于计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差；The second variance calculation unit 3032 is used to calculate the second variance of the consistency parameter of the answers of the workers when duplicate answer sets are deleted from the answer set;

工人能力变化率计算单元3033，用于基于所述第一方差和所述第二方差，计算所述重复答案集合对应的工人能力变化率。The worker ability change rate calculation unit 3033 is configured to calculate the worker ability change rate corresponding to the repeated answer set based on the first variance and the second variance.

本发明一实施方式中，所述第一方差计算单元3031，具体用于通过以下公式计算在所述答案集合中保留重复答案集合时，所述各工人的答案的一致性参数的第一方差：In one embodiment of the present invention, the first variance calculation unit 3031 is specifically used to calculate the repeated answer set in the answer set by the following formula: , the first variance of the consistency parameter of each worker's answer:

本发明一实施方式中，所述第二方差计算单元3032，具体用于通过以下公式计算在所述答案集合中删除重复答案集合时，所述各工人的答案的一致性参数的第二方差：In one embodiment of the present invention, the second variance calculation unit 3032 is specifically used to calculate and delete duplicate answer sets in the answer set by the following formula: , the second variance of the consistency parameter of each worker's answer:

本发明一实施方式中，所述工人能力变化率计算单元3033，具体用于基于以下公式计算重复答案集合对应的工人能力变化率：In one embodiment of the present invention, the worker ability change rate calculation unit 3033 is specifically used to calculate repeated answer sets based on the following formula Corresponding rate of change in worker capacity:

本领域技术人员应当理解，图3所示的基于串谋检测的众包结果汇聚装置中的各模块的实现功能可参照前述基于串谋检测的众包结果汇聚方法的相关描述而理解，图3所示的基于串谋检测的众包结果汇聚装置中的各模块的实现功能可通过运行于处理器上的程序而实现，也可通过具体的逻辑电路而实现。Those skilled in the art should understand that the implementation functions of each module in the crowdsourcing result aggregation device based on collusion detection shown in FIG. The implementation functions of each module in the collusion detection-based crowdsourcing result aggregation device shown can be realized by a program running on a processor, or by a specific logic circuit.

本发明实施例所记载的技术方案之间，在不冲突的情况下，可以任意组合。The technical solutions described in the embodiments of the present invention may be combined arbitrarily if there is no conflict.

在本发明所提供的几个实施例中，应该理解到，所揭露的方法和智能设备，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided by the present invention, it should be understood that the disclosed methods and smart devices can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods, such as: multiple units or components can be combined, or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the mutual coupling, or direct coupling, or communication connection between the various components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms. of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元，即可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units; Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各实施例中的各功能单元可以全部集成在一个处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention can be integrated into one processing unit, or each unit can be used as a single unit, or two or more units can be integrated into one unit; the above-mentioned integration The unit can be realized in the form of hardware or in the form of hardware plus software functional unit.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于一计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for realizing the above-mentioned method embodiments can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the Including the steps of the foregoing method embodiments; and the foregoing storage medium includes: a removable storage device, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, etc. A medium on which program code can be stored.

或者，本发明实施例上述装置如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned apparatus in the embodiment of the present invention is realized in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present invention is essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for Make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in various embodiments of the present invention. The aforementioned storage media include: various media capable of storing program codes such as removable storage devices, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention.

Claims

1. A crowdsourcing result aggregation method based on collusion detection, characterized in that the method comprises:

Collect the answer sets of each worker for the set of tasks from the crowdsourcing platform;

calculating an aggregated result of the set of answers, and calculating a consistency parameter between the aggregated result and each worker's answer;

Determining a repeated answer set from the answer set, and calculating a worker capability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of the workers;

For a repeated answer set whose worker ability change rate is less than or equal to a preset threshold, determine that the repeated answer set is normally generated and retain the repeated answer set in the answer set;

For repeated answer sets whose rate of change of worker ability is greater than a preset threshold, determine that the repeated answer sets are produced by collusion and delete the repeated answer sets from the answer sets;

After retaining or deleting each repeated answer set, an updated answer set is obtained, and an aggregation result of the updated answer set is calculated.

2. The method for converging crowdsourcing results based on collusion detection according to claim 1, wherein the calculation of the consistency parameters of the converging results and the answers of each worker comprises:

The consensus parameter for the aggregated results and the answers of each worker is calculated based on the following formula:

Among them, P _i is the aggregation result and the consistency parameter of worker i and L _i is the answer returned by worker i corresponding to the task set, Aggregated results for answer sets.

3. The crowdsourcing result aggregation method based on collusion detection according to claim 2, characterized in that, based on the consistency parameters of the answers of the workers, the rate of change of worker ability corresponding to each repeated answer set is calculated ,include:

calculating the first variance of the consistency parameter of the answers of the workers while retaining duplicate answer sets in the answer sets;

calculating the second variance of the consistency parameter of the answers of the workers when the repeated answer sets are deleted in the answer sets;

Based on the first variance and the second variance, calculate the worker capability change rate corresponding to the repeated answer set.

4. The crowdsourcing result aggregation method based on collusion detection according to claim 3, characterized in that, when the calculation retains a repeated answer set in the answer set, the consistency parameter of the answers of each worker is First variance, including:

Keep duplicate answer sets in the answer set by the following formula , the first variance of the consistency parameter of each worker's answer:

Among them, Var(P) is the first variance, E(P) is the average value of the consistency parameters of each worker, P _i is the aggregation result and the consistency parameter of worker i and collection of answers.

5. The crowdsourcing result aggregation method based on collusion detection according to claim 3 or 4, characterized in that, when the calculation deletes duplicate answer sets in the answer set, the consistency of the answers of each worker The second variance of the parameters, including:

Delete duplicate answer sets in the answer set by the following formula , the second variance of the consistency parameter of each worker's answer:

Among them, Var(P ^k ) is the second variance, Ε(P ^k ) is the average value of the consistency parameters of each worker, P _i ^k is the aggregation result and the consistency parameter of worker i and to delete collection of answers.

6. The crowdsourcing result aggregation method based on collusion detection according to claim 5, characterized in that, based on the first variance and the second variance, the worker capability corresponding to the repeated answer set is calculated rate of change, including:

Calculate duplicate answer sets based on the following formula Corresponding rate of change in worker capacity:

in, is the rate of change in worker capacity.

7. A crowdsourcing result aggregation device based on collusion detection, characterized in that the device comprises:

The collection module is used to collect the answer sets of each worker for the set of tasks from the crowdsourcing platform;

a consistency calculation module, configured to calculate the aggregation result of the answer set, and calculate the consistency parameter between the aggregation result and the answers of each worker;

A worker ability change rate module, configured to determine a repeated answer set from the answer set, and calculate the worker ability change rate corresponding to each repeated answer set based on the consistency parameters of the answers of each worker;

The collusion detection module is used to determine that the repeated answer set is normally generated and retain the repeated answer set in the answer set for the repeated answer set whose worker ability change rate is less than or equal to the preset threshold; for the worker ability change rate A duplicate answer set greater than a preset threshold, determining that the duplicate answer set is generated by collusion and deleting the duplicate answer set from the answer set;

The aggregation module is configured to obtain an updated answer set after retaining or deleting each repeated answer set, and calculate an aggregation result of the updated answer set.

8. The crowdsourcing result aggregation device based on collusion detection according to claim 7, wherein the consistency calculation module is specifically used to calculate the consistency of the aggregation result and each worker's answer based on the following formula parameter:

9. The device for converging crowdsourcing results based on collusion detection according to claim 7, wherein said worker capability change rate module comprises:

a first variance calculation unit, configured to calculate the first variance of the consistency parameter of the answers of the workers when the repeated answer set is reserved in the answer set;

The second variance calculation unit is used to calculate the second variance of the consistency parameter of the answers of the workers when the duplicate answer set is deleted from the answer set;

A worker capability change rate calculation unit, configured to calculate the worker capability change rate corresponding to the repeated answer set based on the first variance and the second variance.

10. The device for converging crowdsourcing results based on collusion detection according to claim 9, wherein the first variance calculation unit is specifically used to calculate and retain duplicate answer sets in the answer set by the following formula , the first variance of the consistency parameter of each worker's answer:

11. The crowdsourcing result aggregation device based on collusion detection according to claim 9 or 10, wherein the second variance calculation unit is specifically used to calculate and delete repeated answers in the answer set by the following formula gather , the second variance of the consistency parameter of each worker's answer:

12. The device for converging crowdsourcing results based on collusion detection according to claim 11, characterized in that the calculation unit for the rate of change of worker ability is specifically used to calculate the repeated answer set based on the following formula Corresponding rate of change in worker capacity:

in, is the rate of change in worker capacity.