CN111339068A - Crowdsourcing quality control method, apparatus, computer storage medium and computing device - Google Patents

Crowdsourcing quality control method, apparatus, computer storage medium and computing device Download PDF

Info

Publication number
CN111339068A
CN111339068A CN201811554257.6A CN201811554257A CN111339068A CN 111339068 A CN111339068 A CN 111339068A CN 201811554257 A CN201811554257 A CN 201811554257A CN 111339068 A CN111339068 A CN 111339068A
Authority
CN
China
Prior art keywords
crowdsourcing
data
redundant data
work unit
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811554257.6A
Other languages
Chinese (zh)
Other versions
CN111339068B (en
Inventor
耿仕强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201811554257.6A priority Critical patent/CN111339068B/en
Publication of CN111339068A publication Critical patent/CN111339068A/en
Application granted granted Critical
Publication of CN111339068B publication Critical patent/CN111339068B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a crowdsourcing quality control method and device. The method comprises the steps of firstly extracting data with a certain proportion from a crowdsourcing task, copying the extracted data for multiple times to serve as redundant data, and taking the data which is not extracted as non-redundant data. And then, distributing redundant data and non-redundant data to the crowdsourcing work unit for labeling under the condition of ensuring that the task of the same crowdsourcing work unit does not contain repeated data. And after the labeling is finished, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data of each crowdsourcing work unit according to the final answer to obtain the accuracy of the labeling result on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data. The invention realizes the unsupervised task crowdsourcing quality control, and greatly reduces the capital cost and the labor cost of the crowdsourcing task.

Description

Crowdsourcing quality control method, apparatus, computer storage medium and computing device
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a crowdsourcing quality control method, a crowdsourcing quality control apparatus, a computer storage medium, and a computing device.
Background
Crowdsourcing refers to the act of a company or organization outsourcing work tasks performed by employees in the past to an unspecified (and often large) mass network in a free-voluntary fashion, and is a distributed problem-solving and production-organizing mechanism that is open to the internet's masses. The novel multifunctional electric heating water heater has the advantages of low cost, quick effect, capability of performing large-scale tasks and the like, and is more and more popular with people. But because of the freely loose organization mode of crowdsourcing and different professional knowledge of crowdsourcing work units, the capability level is different, and the work purpose and the degree of seriousness are different (for example, some cheaters who completely aim at rapidly acquiring benefits exist in the crowdsourcing work units and are extremely unfamiliar with the task), the crowdsourcing result with low quality is often caused. Therefore, the quality problem of crowdsourcing becomes a major concern of people currently regarding crowdsourcing technology.
In order to solve the problem of low crowdsourcing quality, research on crowdsourcing quality control strategies is gradually developed. The current techniques for controlling the crowdsourcing quality are mainly four:
first, the crowdsourcing task is repeated, resulting in redundant data. That is, each task is distributed to multiple crowdsourced work units simultaneously, and then some statistical methods (such as the method of majority voting) are used to get the final result from multiple results. However, this approach doubles the amount of tasks, with the attendant problem of a sharp increase in cost.
Second, the quality of the annotation is checked using gold standard data. Golden Standard data (Golden Standard data) refers to a type of data having standard answers, which are generally generated by experts having expertise in the field related to tasks. The quality of the labeling result of the worker is detected by doping a certain proportion of gold standard data in the task of each crowdsourcing work unit. However, this method has a problem of increased labor and capital costs due to labeling of a large amount of gold standard data.
And thirdly, carrying out examination and test on the crowdsourcing work unit in advance. That is, before the crowdsourcing work unit receives the crowdsourcing task, the crowdsourcing work unit is tested and assessed by using some data with known answers, and the crowdsourcing work unit passing the assessment can obtain the qualification of starting the crowdsourcing task. Many crowdsourcing tasks on amazon's crowdsourcing platform now take this approach. This approach has the disadvantage of requiring a batch of data with standard answers to be prepared in advance, which results in additional cost. In addition, the quality of the labeling result cannot be detected for workers who do not seriously complete the crowdsourcing task after acquiring the crowdsourcing qualification through examination.
Fourthly, the crowdsourcing results are audited by the full-time staff. That is, a professional is employed to perform a complete or sampling examination on the crowdsourcing results, and the unqualified crowdsourcing results are deleted. However, this approach also has the problem of high capital and labor costs and is not suitable for large-scale crowdsourcing tasks.
In summary, the prior art, while improving the crowdsourcing quality to some extent, increases the labor cost and capital cost. Therefore, a crowd-sourced quality control method that can save capital and labor cost is needed.
Disclosure of Invention
In view of the above, the present invention has been made to provide a crowdsourcing quality control method, a crowdsourcing quality control apparatus, a computer storage medium, and a computing device that overcome or at least partially address the above problems.
According to an aspect of the embodiments of the present invention, there is provided a crowdsourcing quality control method, including:
randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying the extracted data by n parts, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;
distributing a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data;
after all the distributed tasks are labeled, obtaining the final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks;
verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit;
and processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of each crowdsourcing work unit.
Optionally, assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit, comprising:
and in the tasks distributed to each crowdsourcing work unit, putting the redundant data and the non-redundant data according to a second specified proportion, wherein the second specified proportion is determined by the total number of crowdsourcing tasks, the first specified proportion and the copy number n.
Optionally, assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit, comprising:
placing the redundant data and the non-redundant data into a data pool;
when the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
Optionally, when each crowdsourcing work unit receives data from the data pool for tagging, each crowdsourcing work unit is enabled to receive data from the data pool for tagging until no more data which is not tagged by the crowdsourcing work unit exists in the data pool.
Optionally, the method further comprises:
and for the crowdsourcing work unit which does not receive the redundant data, sampling and checking the labeling result of the crowdsourcing task which is completed by the crowdsourcing work unit to obtain the accuracy of the labeling result of the crowdsourcing work unit.
Optionally, the majority element finding algorithm comprises a majority voting algorithm.
Optionally, verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data includes:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
Optionally, processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of the crowdsourcing work unit includes:
comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold;
and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit.
Optionally, the preset threshold is set by a task publisher according to task requirements.
Optionally, the processing means includes penalizing the crowdsourcing work unit and/or deleting the annotated result of the crowdsourcing task of the crowdsourcing work unit.
Optionally, n is an integer in the range of 2-5.
According to another aspect of the embodiments of the present invention, there is also provided a crowdsourcing quality control apparatus, including:
the system comprises a redundant data generation module, a crowdsourcing task module and a crowdsourcing task module, wherein the redundant data generation module is suitable for randomly extracting data from all crowdsourcing tasks according to a first specified proportion and copying n parts of the extracted data, the extracted data and the copied data serve as redundant data, data which are not extracted serve as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;
the crowdsourcing task allocation module is suitable for allocating a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks allocated by the same crowdsourcing work unit do not contain repeated redundant data;
the data answer obtaining module is suitable for obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks after all the distributed tasks are labeled;
the redundant data checking module is suitable for checking the marking result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the marking result of each crowdsourcing work unit; and
and the labeling result processing module is suitable for processing the labeling results of the crowdsourcing tasks completed by each crowdsourcing work unit according to the accuracy of the labeling results of the crowdsourcing work units.
Optionally, the crowdsourcing task allocation module is further adapted to:
and in the tasks distributed to each crowdsourcing work unit, putting the redundant data and the non-redundant data according to a second specified proportion, wherein the second specified proportion is determined by the total number of crowdsourcing tasks, the first specified proportion and the copy number n.
Optionally, the crowdsourcing task allocation module is further adapted to:
placing the redundant data and the non-redundant data into a data pool;
when the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
Optionally, the crowdsourcing task allocation module is further adapted to:
when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can always receive the data from the data pool for marking until the data pool does not contain the data which is not marked by the crowdsourcing work unit.
Optionally, the apparatus further comprises:
and the sampling checking module is suitable for sampling checking the labeling result of the crowdsourcing task completed by the crowdsourcing work unit which does not receive the redundant data to obtain the accuracy of the labeling result of the crowdsourcing work unit.
Optionally, the majority element finding algorithm comprises a majority voting algorithm.
Optionally, the redundant data check module is further adapted to:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
Optionally, the annotation result processing module is further adapted to:
comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold;
and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit.
Optionally, the preset threshold is set by a task publisher according to task requirements.
Optionally, the processing means includes penalizing the crowdsourcing work unit and/or deleting the annotated result of the crowdsourcing task of the crowdsourcing work unit.
Optionally, n is an integer in the range of 2-5.
According to yet another aspect of embodiments of the present invention, there is also provided a computer storage medium having stored thereon computer program code which, when run on a computing device, causes the computing device to execute a crowdsourcing quality control method according to any one of the above.
According to still another aspect of the embodiments of the present invention, there is also provided a computing device including:
a processor; and
a memory storing computer program code;
the computer program code, when executed by the processor, causes the computing device to perform a crowdsourcing quality control method according to any one of the above.
According to the crowdsourcing quality control method and device provided by the embodiment of the invention, data is randomly extracted from all crowdsourcing tasks according to a first specified proportion, a plurality of copies of the extracted data are copied to serve as redundant data, and one copy of the data which is not extracted serves as non-redundant data. And then, distributing the total task consisting of redundant data and non-redundant data to a crowdsourcing work unit for marking, wherein the task distributed by the same crowdsourcing work unit is ensured not to contain repeated redundant data. And then, after the crowdsourcing task is completed, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the accuracy of each crowdsourcing work unit on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data, and correspondingly processing the labeling result of the crowdsourcing task completed according to the overall completion condition. By the method, unsupervised task crowdsourcing quality control is realized on the premise of only slightly increasing the task quantity, so that the capital cost and the labor cost of crowdsourcing tasks are greatly reduced.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a flow diagram of a crowdsourcing quality control method according to an embodiment of the invention;
FIG. 2 illustrates a flow diagram of a crowdsourcing quality control method according to one embodiment of the invention;
FIG. 3 illustrates a flow diagram of a crowdsourcing quality control method according to another embodiment of the invention; and
fig. 4 is a schematic structural diagram of a crowdsourcing quality control device according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
To solve the above technical problem, an embodiment of the present invention provides a method for controlling crowdsourcing quality. Fig. 1 shows a flow chart of a crowdsourcing quality control method according to an embodiment of the invention. Referring to fig. 1, the method may include at least the following steps S102 to S110.
Step S102, randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying n parts of the extracted data, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units.
And step S104, distributing the total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data.
And step S106, after all the distributed tasks are labeled, obtaining the final answer of the redundant data through an algorithm for searching most elements according to the labeling result of the redundant data in all the distributed tasks.
And S108, verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit.
And step S110, processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of each crowdsourcing work unit.
The crowdsourcing quality control method provided by the embodiment of the invention comprises the steps of randomly extracting data from all crowdsourcing tasks according to a first specified proportion, copying a plurality of parts of the extracted data to be used as redundant data, and keeping one part of the data which is not extracted to be used as non-redundant data. And then, distributing the total task consisting of redundant data and non-redundant data to a crowdsourcing work unit for marking, wherein the task distributed by the same crowdsourcing work unit is ensured not to contain repeated redundant data. And then, after the crowdsourcing task is completed, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the accuracy of each crowdsourcing work unit on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data, and correspondingly processing the labeling result of the crowdsourcing task completed according to the overall completion condition. By the method, unsupervised task crowdsourcing quality control is realized on the premise of only slightly increasing the task quantity, so that the capital cost and the labor cost of crowdsourcing tasks are greatly reduced.
In the above step S102, the extraction ratio of the data (i.e., the first specified ratio) and the number of copies (n) are mainly determined by the degree of difficulty of the crowdsourcing task and the capability level of the crowdsourcing work unit. In general, the extraction ratio is set according to the budget of the crowdsourcing task and the total number of tasks, for example, the extraction ratio is set to 10%. The number of copies n is preferably an integer in the range of 2 to 5, taking into consideration both cost and quality factors.
In the above step S104, the step of allocating the overall task composed of the non-redundant data and the redundant data to the crowdsourced work unit may be implemented by the following two allocation manners.
The first distribution mode: in the tasks allocated to each crowdsourcing work unit, redundant data and non-redundant data are put in a second designated proportion, wherein the second designated proportion is determined by the total number of crowdsourcing tasks, the first designated proportion and the number n of copies.
The first formulation is described below with reference to a specific example.
Assuming that the total crowdsourcing task includes 1000 pieces of data, the estimated number of crowdsourcing work units is 4, 10% of the data (i.e., 100 pieces of data) is extracted, and the extracted 100 pieces of data are copied into 3 copies, so that 400 pieces of redundant data and 900 pieces of non-redundant data are obtained. When task allocation is performed, 900 pieces of non-redundant data are divided into 20 jobs (jobs), and each job contains 45 pieces of non-redundant data. Since only 100 of the 400 redundant data are valid data, and the remaining 300 redundant data are duplicate data, in order to ensure that there is no duplicate data in each job, the 20 jobs are divided into 4 shares, each share includes 5 jobs, thus, every 100 non-duplicate data in the 400 redundant data are distributed to 5 different jobs, and 20 redundant data are put into each job. Finally, each job contains 65 pieces of data, of which 20 are redundant data. Next, the crowdsourced work units (i.e., the tagging units) can get jobs, each crowdsourced work unit gets one job at a time, wherein each job taken by the same crowdsourced work unit cannot have repeated data therebetween, and each crowdsourced work unit can only take 5 jobs at most, thereby ensuring that all jobs taken by the same crowdsourced work unit do not contain repeated redundant data.
It should be noted that the above examples are only illustrative and are not to be construed as limiting the present invention.
The second distribution mode: redundant data and non-redundant data are placed into a pool of data. When the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data, and therefore the task distributed by the same crowdsourcing work unit cannot contain repeated redundant data.
Under the second distribution formula, in order to ensure the task completion degree of crowdsourcing, when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can always receive the data from the data pool for marking until no more data which are not marked by the crowdsourcing work unit exists in the data pool, so that on the premise that the task distributed by the same crowdsourcing work unit does not contain repeated redundant data, the task completion rate of each crowdsourcing work unit is improved to the maximum extent, and the extra management cost caused by the excessive number of crowdsourcing work units is reduced.
In step S106, after all crowdsourcing tasks are labeled, the final answer of the redundant data is obtained from the labeling result of the redundant data by an algorithm that finds most elements. The algorithms mentioned herein for finding most elements may include induction, sorting, finding intermediate elements, and the like.
Preferably, the above-mentioned algorithm for finding majority elements is a majority voting algorithm.
The Majority voting Algorithm (Majority Vote), also called mole voting Algorithm (Boyer-moore Vote), is an Algorithm that finds the most elements in a sequence of elements, with a linear temporal complexity O (n) and a linear spatial complexity O (1). It is a typical flow algorithm that can quickly calculate the number that appears in an array more than half times, i.e. the majority (maj authority).
In step S108, the labeling result of the redundant data in the task of each crowdsourcing work unit is checked according to the obtained final answer of the redundant data, so as to obtain the accuracy of each crowdsourcing work unit on the redundant data.
In an alternative embodiment, the step of checking the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data may be implemented as follows:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
And then, the obtained accuracy of each crowdsourcing work unit on the redundant data is used as the accuracy of the labeling result of each crowdsourcing work unit, so that the overall completion condition of the crowdsourcing task of each crowdsourcing work unit is judged according to the accuracy.
In addition, under the second distribution scheme described above, there may be situations where some crowdsourced work units do not take in redundant data due to the randomness of the data distribution. Therefore, in a preferred embodiment of the present invention, for a crowdsourcing work unit that does not receive redundant data, sampling check may be performed on the labeling result of the crowdsourcing task that is completed to obtain the accuracy of the labeling result of the crowdsourcing work unit, and the overall completion of the crowdsourcing task of the crowdsourcing work unit may be determined based on the accuracy. Through the combination of automatic verification and sampling verification, the overall crowdsourcing marking quality is verified and evaluated by a small amount of data, so that the marking cost is saved, and the control effect on the crowdsourcing task completion quality is ensured.
The presentation manner of the accuracy data obtained by verifying the annotation result according to the embodiment of the present invention may typically be shown in the following table:
# the current renService processing unit Number of completions Current batch accuracy
1 AA3340 306 86.67%(65/75)
2 BB666 276 91.89%(68/74)
3 CC927 280 90.36%(75/83)
4 DD1998 290 87.5%(56/64)
5 EE233 88 85.71%(24/28)
6 FF87 327 87.21%(75/86)
7 GG959 271 84.72%(61/72)
8 HH595 238 90%(45/50)
9 II1025 469 92.31%(96/104)
10 JJ0618 269 84.29%(59/70)
It should be noted that the data in the above table are only illustrative and should not be construed as limiting the invention.
In the step S110, the labeling result of the crowdsourcing task completed by each crowdsourcing work unit is processed according to the accuracy of the obtained labeling result of the crowdsourcing work unit, so as to ensure the control of the completion quality of the crowdsourcing task.
In an alternative embodiment, step S110 may be implemented as the following steps:
firstly, the accuracy of the labeling result of each crowdsourcing work unit is compared with a preset threshold.
And then, if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit. Wherein the processing measure may include penalizing the crowdsourcing work unit and/or deleting the annotation result of the crowdsourcing task of the crowdsourcing work unit.
Preferably, the preset threshold value can be self-defined by the task publisher according to the task requirement, so that the flexibility of crowdsourcing tasks is improved.
In the above, various implementation manners of each link of the embodiment shown in fig. 1 are introduced, and the implementation process of the crowdsourcing quality control method of the present invention will be described in detail through specific embodiments.
Example one
Fig. 2 shows a flow chart of a crowdsourcing quality control method according to an embodiment of the invention. Referring to fig. 2, the method may include the following steps S202 to S212.
Step S202, randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying n parts of the extracted data, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units.
And step S204, redundant data and non-redundant data are put in the tasks distributed to each crowdsourcing work unit according to a second designated proportion, wherein the second designated proportion is determined by the total number of crowdsourcing tasks, the first designated proportion and the copy number n, and the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data.
And step S206, after all the distributed tasks are labeled, obtaining the final answer of the redundant data through a majority voting algorithm according to the labeling result of the redundant data in all the distributed tasks.
Step S208, comparing and checking the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit.
Step S210, comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold.
In step S212, if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, punishing the crowdsourcing work unit and/or deleting the labeling result of the crowdsourcing task of the crowdsourcing work unit.
The crowdsourcing quality control method provided by the embodiment of the invention realizes unsupervised task crowdsourcing quality control on the premise of only slightly increasing the task quantity, thereby greatly reducing the capital cost and the labor cost of crowdsourcing tasks.
Example two
Fig. 3 shows a flow chart of a crowdsourcing quality control method according to another embodiment of the invention. Referring to fig. 3, the method may include the following steps S302 to S314.
Step S302, randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying n parts of the extracted data, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units.
Step S304, putting redundant data and non-redundant data into a data pool; when labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool every time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
And step S306, after all the data in the data pool are labeled, obtaining the final answer of the redundant data through a majority voting algorithm according to the labeling result of the redundant data received by all the crowdsourcing working units.
Step S308, comparing and checking the marking result of the received redundant data with the final answer of the redundant data of the crowdsourcing work unit for the crowdsourcing work unit receiving the redundant data to obtain the correct rate of the crowdsourcing work unit on the redundant data, and taking the correct rate of the crowdsourcing work unit on the redundant data as the correct rate of the marking result of the crowdsourcing work unit.
And step S310, for the crowdsourcing work unit which does not receive the redundant data, sampling and checking the labeling result of the crowdsourcing task which is completed by the crowdsourcing work unit to obtain the accuracy of the labeling result of the crowdsourcing work unit.
In step S312, the accuracy of the labeling result of each crowdsourcing unit is compared with a preset threshold.
Step S314, if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, performing a penalty on the crowdsourcing work unit and/or deleting the labeling result of the crowdsourcing task of the crowdsourcing work unit.
The crowdsourcing quality control method provided by the embodiment of the invention randomly distributes redundant data and non-redundant data through the data pool, thereby reducing the complexity of task distribution. Meanwhile, through the combination of automatic verification and sampling verification, the overall crowdsourcing marking quality is verified and evaluated by a small amount of data, so that the marking cost is saved, and the control effect on the crowdsourcing task completion quality is ensured.
It should be noted that, in practical applications, all the above optional embodiments may be combined in a combined manner at will to form an optional embodiment of the present invention, and details are not described here any more.
Based on the same inventive concept, the embodiment of the present invention further provides a crowdsourcing quality control device, which is used for supporting the crowdsourcing quality control method provided by any one of the above embodiments or a combination thereof. Fig. 4 is a schematic structural diagram of a crowdsourcing quality control device according to an embodiment of the invention. Referring to fig. 4, the apparatus may include at least: the system comprises a redundant data generation module 410, a crowdsourcing task allocation module 420, a data answer acquisition module 430, a redundant data checking module 440 and a labeling result processing module 450.
The functions of the components or devices of the crowdsourcing quality control device of the embodiment of the invention and the connection relationship among the parts are described:
and the redundant data generation module 410 is suitable for randomly extracting data from all crowdsourcing tasks according to a first specified proportion and copying the extracted data into n parts, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and is less than the number of crowdsourcing work units.
And the crowdsourcing task allocation module 420 is connected with the redundant data generation module 410 and is adapted to allocate a total task consisting of non-redundant data and redundant data to a crowdsourcing work unit for labeling, wherein the tasks allocated by the same crowdsourcing work unit do not contain repeated redundant data.
And the data answer obtaining module 430 is connected to the crowdsourcing task allocating module 420, and is adapted to obtain a final answer of the redundant data by an algorithm for searching a majority element according to a labeling result of the redundant data in all allocated tasks after all allocated tasks are labeled.
The redundant data checking module 440 is connected to the data answer obtaining module 430, and is adapted to check the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and use the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit.
And the labeling result processing module 450 is connected with the redundant data checking module 440 and is adapted to process the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of the crowdsourcing work unit.
In an alternative embodiment, the crowdsourcing task allocation module 420 is further adapted to:
in the tasks allocated to each crowdsourcing work unit, redundant data and non-redundant data are put in a second designated proportion, wherein the second designated proportion is determined by the total number of crowdsourcing tasks, the first designated proportion and the number n of copies.
In another alternative embodiment, the crowdsourcing task allocation module 420 is further adapted to:
putting redundant data and non-redundant data into a data pool;
when labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool every time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
Further, to ensure the task completion of crowdsourcing, the crowdsourcing task allocation module 420 is further adapted to:
when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can receive the data from the data pool for marking until the data pool does not contain the data which is not marked by the crowdsourcing work unit.
Further, in order to ensure the effect of controlling the quality of the completion of the crowdsourcing task, considering that when each crowdsourcing work unit receives data from the data pool for marking, due to the randomness of data allocation, some crowdsourcing work units may not receive redundant data, the crowdsourcing quality control device may further include:
and the sampling checking module is suitable for sampling checking the labeling result of the crowdsourcing task completed by the crowdsourcing work unit which does not receive the redundant data to obtain the accuracy of the labeling result of the crowdsourcing work unit.
In an alternative embodiment, the above-mentioned algorithm for finding majority elements comprises a majority voting algorithm.
In an alternative embodiment, the redundant data check module 440 is further adapted to:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
In an alternative embodiment, the annotation result processing module 450 is further adapted to:
comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold;
and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than a preset threshold, executing a processing measure on the crowdsourcing work unit. Wherein the processing measure may include penalizing the crowdsourcing work unit and/or deleting the annotation result of the crowdsourcing task of the crowdsourcing work unit.
Preferably, the preset threshold is set by the task publisher according to task requirements.
In an alternative embodiment, the number of copies n is an integer in the range of 2 to 5.
Based on the same inventive concept, the embodiment of the invention also provides a computer storage medium. The computer storage medium stores computer program code which, when run on a computing device, causes the computing device to perform a crowdsourcing quality control method according to any one or combination of the above embodiments.
Based on the same inventive concept, the embodiment of the invention also provides the computing equipment. The computing device may include:
a processor; and
a memory storing computer program code;
the computer program code, when executed by a processor, causes the computing device to perform a crowdsourcing quality control method according to any one or combination of the above embodiments.
According to any one or a combination of multiple optional embodiments, the embodiment of the present invention can achieve the following advantages:
according to the crowdsourcing quality control method and device provided by the embodiment of the invention, data is randomly extracted from all crowdsourcing tasks according to a first specified proportion, a plurality of copies of the extracted data are copied to serve as redundant data, and one copy of the data which is not extracted serves as non-redundant data. And then, distributing the total task consisting of redundant data and non-redundant data to a crowdsourcing work unit for marking, wherein the task distributed by the same crowdsourcing work unit is ensured not to contain repeated redundant data. And then, after the crowdsourcing task is completed, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the accuracy of each crowdsourcing work unit on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data, and correspondingly processing the labeling result of the crowdsourcing task completed according to the overall completion condition. By the method, unsupervised task crowdsourcing quality control is realized on the premise of only slightly increasing the task quantity, so that the capital cost and the labor cost of crowdsourcing tasks are greatly reduced.
It is clear to those skilled in the art that the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.
In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.
Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: u disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program code.
Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a computing device, e.g., a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.
According to an aspect of the embodiments of the present invention, a1. a crowdsourcing quality control method is provided, including:
randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying the extracted data by n parts, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;
distributing a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data;
after all the distributed tasks are labeled, obtaining the final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks;
verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit;
and processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of each crowdsourcing work unit.
A2. The method of a1, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit comprises:
and in the tasks distributed to each crowdsourcing work unit, putting the redundant data and the non-redundant data according to a second specified proportion, wherein the second specified proportion is determined by the total number of crowdsourcing tasks, the first specified proportion and the copy number n.
A3. The method of a1 or a2, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit comprises:
placing the redundant data and the non-redundant data into a data pool;
when the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
A4. The method according to any one of a1-A3, wherein, when each crowdsourced work unit gets data from the data pool for annotation, each crowdsourced work unit is enabled to get data from the data pool for annotation until there is no more data in the data pool that the crowdsourced work unit has not been annotated.
A5. The method of any one of a1-a4, further comprising:
and for the crowdsourcing work unit which does not receive the redundant data, sampling and checking the labeling result of the crowdsourcing task which is completed by the crowdsourcing work unit to obtain the accuracy of the labeling result of the crowdsourcing work unit.
A6. The method of any one of a1-a5, wherein the algorithm to find majority elements comprises a majority voting algorithm.
A7. The method according to any one of A1-A6, wherein checking labeling results of redundant data in tasks of each crowd-sourced work unit according to final answers to the redundant data comprises:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
A8. The method according to any one of A1-A7, wherein the processing of the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the correctness of the labeling result of the crowdsourcing work unit comprises the following steps:
comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold;
and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit.
A9. The method of any of a1-A8, wherein the preset threshold is set by a task publisher according to task requirements.
A10. The method of any one of a1-a9, wherein the processing means includes penalizing the crowd-sourced work unit and/or deleting annotated results of the crowd-sourced task for the crowd-sourced work unit.
A11. The method of any one of a1-a10, wherein n is an integer in the range of 2-5.
According to another aspect of the embodiments of the present invention, there is also provided b12. a crowdsourcing quality control apparatus, including:
the system comprises a redundant data generation module, a crowdsourcing task module and a crowdsourcing task module, wherein the redundant data generation module is suitable for randomly extracting data from all crowdsourcing tasks according to a first specified proportion and copying n parts of the extracted data, the extracted data and the copied data serve as redundant data, data which are not extracted serve as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;
the crowdsourcing task allocation module is suitable for allocating a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks allocated by the same crowdsourcing work unit do not contain repeated redundant data;
the data answer obtaining module is suitable for obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks after all the distributed tasks are labeled;
the redundant data checking module is suitable for checking the marking result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the marking result of each crowdsourcing work unit; and
and the labeling result processing module is suitable for processing the labeling results of the crowdsourcing tasks completed by each crowdsourcing work unit according to the accuracy of the labeling results of the crowdsourcing work units.
B13. The apparatus of B12, wherein the crowdsourcing task allocation module is further adapted to:
and in the tasks distributed to each crowdsourcing work unit, putting the redundant data and the non-redundant data according to a second specified proportion, wherein the second specified proportion is determined by the total number of crowdsourcing tasks, the first specified proportion and the copy number n.
B14. The apparatus of B12 or B13, wherein the crowdsourced task allocation module is further adapted to:
placing the redundant data and the non-redundant data into a data pool;
when the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
B15. The apparatus of any one of B12-B14, wherein the crowdsourced task allocation module is further adapted to:
when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can always receive the data from the data pool for marking until the data pool does not contain the data which is not marked by the crowdsourcing work unit.
B16. The apparatus of any one of B12-B15, further comprising:
and the sampling checking module is suitable for sampling checking the labeling result of the crowdsourcing task completed by the crowdsourcing work unit which does not receive the redundant data to obtain the accuracy of the labeling result of the crowdsourcing work unit.
B17. The apparatus of any one of B12-B16, wherein the majority element finding algorithm comprises a majority voting algorithm.
B18. The apparatus of any one of B12-B17, wherein the redundant data check module is further adapted to:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
B19. The apparatus of any one of B12-B18, wherein the annotation result processing module is further adapted to:
comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold;
and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit.
B20. The apparatus of any one of B12-B19, wherein the preset threshold is set by a task publisher according to task requirements.
B21. The apparatus of any one of B12-B20, wherein the processing means includes penalizing the crowd-sourced work unit and/or deleting annotated results of the crowd-sourced task for the crowd-sourced work unit.
B22. The apparatus of any one of B12-B21, wherein n is an integer in the range of 2-5.
According to yet another aspect of embodiments of the present invention, there is also provided c23 a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the crowdsourcing quality control method according to any one of a1-a 11.
There is also provided, in accordance with yet another aspect of an embodiment of the present invention, apparatus for computing, comprising:
a processor; and
a memory storing computer program code;
the computer program code, when executed by the processor, causes the computing device to perform the crowdsourcing quality control method according to any one of a1-a 11.

Claims (10)

1. A method of crowd sourcing quality control, comprising:
randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying the extracted data by n parts, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;
distributing a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data;
after all the distributed tasks are labeled, obtaining the final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks;
verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit;
and processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of each crowdsourcing work unit.
2. The method of claim 1, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced unit of work comprises:
and in the tasks distributed to each crowdsourcing work unit, putting the redundant data and the non-redundant data according to a second specified proportion, wherein the second specified proportion is determined by the total number of crowdsourcing tasks, the first specified proportion and the copy number n.
3. The method of claim 1 or 2, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit comprises:
placing the redundant data and the non-redundant data into a data pool;
when the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.
4. The method of any one of claims 1-3, wherein each crowdsourced work unit is enabled to always get data from the data pool for tagging when it gets data from the data pool for tagging until there is no more data in the data pool that the crowdsourced work unit has not tagged.
5. The method according to any one of claims 1-4, further comprising:
and for the crowdsourcing work unit which does not receive the redundant data, sampling and checking the labeling result of the crowdsourcing task which is completed by the crowdsourcing work unit to obtain the accuracy of the labeling result of the crowdsourcing work unit.
6. The method of any of claims 1-5, wherein the majority element finding algorithm comprises a majority voting algorithm.
7. The method of any one of claims 1-6, wherein checking the labeling result of the redundant data in the task of each crowd-sourced work unit according to the final answer to the redundant data comprises:
and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.
8. A crowdsourcing quality control device comprising:
the system comprises a redundant data generation module, a crowdsourcing task module and a crowdsourcing task module, wherein the redundant data generation module is suitable for randomly extracting data from all crowdsourcing tasks according to a first specified proportion and copying n parts of the extracted data, the extracted data and the copied data serve as redundant data, data which are not extracted serve as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;
the crowdsourcing task allocation module is suitable for allocating a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks allocated by the same crowdsourcing work unit do not contain repeated redundant data;
the data answer obtaining module is suitable for obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks after all the distributed tasks are labeled;
the redundant data checking module is suitable for checking the marking result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the marking result of each crowdsourcing work unit; and
and the labeling result processing module is suitable for processing the labeling results of the crowdsourcing tasks completed by each crowdsourcing work unit according to the accuracy of the labeling results of the crowdsourcing work units.
9. A computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the crowdsourcing quality control method of any one of claims 1-7.
10. A computing device, comprising:
a processor; and
a memory storing computer program code;
the computer program code, when executed by the processor, causes the computing device to perform the crowdsourcing quality control method of any of claims 1-7.
CN201811554257.6A 2018-12-18 2018-12-18 Crowd-sourced quality control method, device, computer storage medium and computing equipment Active CN111339068B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811554257.6A CN111339068B (en) 2018-12-18 2018-12-18 Crowd-sourced quality control method, device, computer storage medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811554257.6A CN111339068B (en) 2018-12-18 2018-12-18 Crowd-sourced quality control method, device, computer storage medium and computing equipment

Publications (2)

Publication Number Publication Date
CN111339068A true CN111339068A (en) 2020-06-26
CN111339068B CN111339068B (en) 2024-04-19

Family

ID=71183263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811554257.6A Active CN111339068B (en) 2018-12-18 2018-12-18 Crowd-sourced quality control method, device, computer storage medium and computing equipment

Country Status (1)

Country Link
CN (1) CN111339068B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140165071A1 (en) * 2012-12-06 2014-06-12 Xerox Corporation Method and system for managing allocation of tasks to be crowdsourced
US20140343984A1 (en) * 2013-03-14 2014-11-20 University Of Southern California Spatial crowdsourcing with trustworthy query answering
CN104599084A (en) * 2015-02-12 2015-05-06 北京航空航天大学 Crowd calculation quality control method and device
US20150235160A1 (en) * 2014-02-20 2015-08-20 Xerox Corporation Generating gold questions for crowdsourcing
CN106779307A (en) * 2016-11-22 2017-05-31 崔岩 The data processing method and system of cubic management system
US20170277678A1 (en) * 2016-03-24 2017-09-28 Document Crowdsourced Proof Reading, LLC Document crowdsourced proofreading system and method
CN107273492A (en) * 2017-06-15 2017-10-20 复旦大学 A kind of exchange method based on mass-rent platform processes image labeling task
CN107679766A (en) * 2017-10-24 2018-02-09 北京航空航天大学 A kind of gunz task dynamic redundancy dispatching method and device
CN107729378A (en) * 2017-07-13 2018-02-23 华中科技大学 A kind of data mask method
CN108154306A (en) * 2017-12-28 2018-06-12 百度在线网络技术(北京)有限公司 Task processing method, device, equipment and the computer readable storage medium of crowdsourcing model
CN108537240A (en) * 2017-03-01 2018-09-14 华东师范大学 Commodity image semanteme marking method based on domain body
CN108647858A (en) * 2018-04-12 2018-10-12 华东师范大学 A kind of collaboration crowdsourcing method of quality control based on user's inconsistency information

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140165071A1 (en) * 2012-12-06 2014-06-12 Xerox Corporation Method and system for managing allocation of tasks to be crowdsourced
US20140343984A1 (en) * 2013-03-14 2014-11-20 University Of Southern California Spatial crowdsourcing with trustworthy query answering
US20150235160A1 (en) * 2014-02-20 2015-08-20 Xerox Corporation Generating gold questions for crowdsourcing
CN104599084A (en) * 2015-02-12 2015-05-06 北京航空航天大学 Crowd calculation quality control method and device
US20170277678A1 (en) * 2016-03-24 2017-09-28 Document Crowdsourced Proof Reading, LLC Document crowdsourced proofreading system and method
CN106779307A (en) * 2016-11-22 2017-05-31 崔岩 The data processing method and system of cubic management system
CN108537240A (en) * 2017-03-01 2018-09-14 华东师范大学 Commodity image semanteme marking method based on domain body
CN107273492A (en) * 2017-06-15 2017-10-20 复旦大学 A kind of exchange method based on mass-rent platform processes image labeling task
CN107729378A (en) * 2017-07-13 2018-02-23 华中科技大学 A kind of data mask method
CN107679766A (en) * 2017-10-24 2018-02-09 北京航空航天大学 A kind of gunz task dynamic redundancy dispatching method and device
CN108154306A (en) * 2017-12-28 2018-06-12 百度在线网络技术(北京)有限公司 Task processing method, device, equipment and the computer readable storage medium of crowdsourcing model
CN108647858A (en) * 2018-04-12 2018-10-12 华东师范大学 A kind of collaboration crowdsourcing method of quality control based on user's inconsistency information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIYI LI等: "Incorporating Worker Similarity for Label Aggregation in Crowdsourcing", 《ICANN 2018: ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING》, 26 September 2018 (2018-09-26), pages 596 - 606, XP047487301, DOI: 10.1007/978-3-030-01421-6_57 *
史珩: "基于自步学习的众包分类数据质量控制模型", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》, no. 4, 15 April 2018 (2018-04-15), pages 152 - 555 *
张志强等: "众包质量控制策略及评估算法研究", 《计算机学报》, vol. 36, no. 8, 15 August 2013 (2013-08-15), pages 1636 - 1649 *

Also Published As

Publication number Publication date
CN111339068B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN110837410B (en) Task scheduling method and device, electronic equipment and computer readable storage medium
CN108833458B (en) Application recommendation method, device, medium and equipment
CN106484606A (en) Method and apparatus submitted to by a kind of code
US20190235987A1 (en) Duplicate bug report detection using machine learning algorithms and automated feedback incorporation
CN106372977B (en) A kind of processing method and equipment of virtual account
US11321359B2 (en) Review and curation of record clustering changes at large scale
CN109214634A (en) A kind of information processing method, device and information processing readable medium
CN111367782A (en) Method and device for automatically generating regression test data
CN110866000B (en) Data quality evaluation method and device, electronic equipment and storage medium
US20130013244A1 (en) Pattern based test prioritization using weight factors
CN111311276B (en) Identification method and device for abnormal user group and readable storage medium
CN111339068A (en) Crowdsourcing quality control method, apparatus, computer storage medium and computing device
CN109451332B (en) User attribute marking method and device, computer equipment and medium
CN109583691B (en) Electronic device, orphan list distribution method, and computer-readable storage medium
CN110262950A (en) Abnormal movement detection method and device based on many index
CN107395447A (en) Module detection method, power system capacity predictor method and corresponding equipment
CN111967798B (en) Method, device and equipment for distributing experimental samples and computer readable storage medium
CN112801551B (en) Test method, device, equipment and storage medium for online room selection system
CN112559347B (en) Test distribution method and apparatus, device, readable medium and computer program product
CN110851344B (en) Big data testing method and device based on complexity of calculation formula and electronic equipment
CN113159537A (en) Evaluation method and device for new technical project of power grid and computer equipment
CN114020420A (en) Distributed to-be-executed task execution method and system, storage medium and terminal
CN109344047B (en) System regression testing method, computer-readable storage medium, and terminal device
JP2020161044A (en) System, method, and program for managing data
CN111768130A (en) User allocation method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant