CN111339068A

CN111339068A - Crowdsourcing quality control method, apparatus, computer storage medium and computing device

Info

Publication number: CN111339068A
Application number: CN201811554257.6A
Authority: CN
Inventors: 耿仕强
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2018-12-18
Filing date: 2018-12-18
Publication date: 2020-06-26
Anticipated expiration: 2038-12-18
Also published as: CN111339068B

Abstract

The invention provides a crowdsourcing quality control method and device. The method comprises the steps of firstly extracting data with a certain proportion from a crowdsourcing task, copying the extracted data for multiple times to serve as redundant data, and taking the data which is not extracted as non-redundant data. And then, distributing redundant data and non-redundant data to the crowdsourcing work unit for labeling under the condition of ensuring that the task of the same crowdsourcing work unit does not contain repeated data. And after the labeling is finished, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data of each crowdsourcing work unit according to the final answer to obtain the accuracy of the labeling result on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data. The invention realizes the unsupervised task crowdsourcing quality control, and greatly reduces the capital cost and the labor cost of the crowdsourcing task.

Description

Crowdsourcing quality control method, apparatus, computer storage medium and computing device

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a crowdsourcing quality control method, a crowdsourcing quality control apparatus, a computer storage medium, and a computing device.

Background

Crowdsourcing refers to the act of a company or organization outsourcing work tasks performed by employees in the past to an unspecified (and often large) mass network in a free-voluntary fashion, and is a distributed problem-solving and production-organizing mechanism that is open to the internet's masses. The novel multifunctional electric heating water heater has the advantages of low cost, quick effect, capability of performing large-scale tasks and the like, and is more and more popular with people. But because of the freely loose organization mode of crowdsourcing and different professional knowledge of crowdsourcing work units, the capability level is different, and the work purpose and the degree of seriousness are different (for example, some cheaters who completely aim at rapidly acquiring benefits exist in the crowdsourcing work units and are extremely unfamiliar with the task), the crowdsourcing result with low quality is often caused. Therefore, the quality problem of crowdsourcing becomes a major concern of people currently regarding crowdsourcing technology.

In order to solve the problem of low crowdsourcing quality, research on crowdsourcing quality control strategies is gradually developed. The current techniques for controlling the crowdsourcing quality are mainly four:

first, the crowdsourcing task is repeated, resulting in redundant data. That is, each task is distributed to multiple crowdsourced work units simultaneously, and then some statistical methods (such as the method of majority voting) are used to get the final result from multiple results. However, this approach doubles the amount of tasks, with the attendant problem of a sharp increase in cost.

Second, the quality of the annotation is checked using gold standard data. Golden Standard data (Golden Standard data) refers to a type of data having standard answers, which are generally generated by experts having expertise in the field related to tasks. The quality of the labeling result of the worker is detected by doping a certain proportion of gold standard data in the task of each crowdsourcing work unit. However, this method has a problem of increased labor and capital costs due to labeling of a large amount of gold standard data.

And thirdly, carrying out examination and test on the crowdsourcing work unit in advance. That is, before the crowdsourcing work unit receives the crowdsourcing task, the crowdsourcing work unit is tested and assessed by using some data with known answers, and the crowdsourcing work unit passing the assessment can obtain the qualification of starting the crowdsourcing task. Many crowdsourcing tasks on amazon's crowdsourcing platform now take this approach. This approach has the disadvantage of requiring a batch of data with standard answers to be prepared in advance, which results in additional cost. In addition, the quality of the labeling result cannot be detected for workers who do not seriously complete the crowdsourcing task after acquiring the crowdsourcing qualification through examination.

Fourthly, the crowdsourcing results are audited by the full-time staff. That is, a professional is employed to perform a complete or sampling examination on the crowdsourcing results, and the unqualified crowdsourcing results are deleted. However, this approach also has the problem of high capital and labor costs and is not suitable for large-scale crowdsourcing tasks.

In summary, the prior art, while improving the crowdsourcing quality to some extent, increases the labor cost and capital cost. Therefore, a crowd-sourced quality control method that can save capital and labor cost is needed.

Disclosure of Invention

In view of the above, the present invention has been made to provide a crowdsourcing quality control method, a crowdsourcing quality control apparatus, a computer storage medium, and a computing device that overcome or at least partially address the above problems.

According to an aspect of the embodiments of the present invention, there is provided a crowdsourcing quality control method, including:

randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying the extracted data by n parts, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;

distributing a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data;

after all the distributed tasks are labeled, obtaining the final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks;

verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit;

and processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of each crowdsourcing work unit.

Optionally, assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit, comprising:

and in the tasks distributed to each crowdsourcing work unit, putting the redundant data and the non-redundant data according to a second specified proportion, wherein the second specified proportion is determined by the total number of crowdsourcing tasks, the first specified proportion and the copy number n.

placing the redundant data and the non-redundant data into a data pool;

when the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.

Optionally, when each crowdsourcing work unit receives data from the data pool for tagging, each crowdsourcing work unit is enabled to receive data from the data pool for tagging until no more data which is not tagged by the crowdsourcing work unit exists in the data pool.

Optionally, the method further comprises:

and for the crowdsourcing work unit which does not receive the redundant data, sampling and checking the labeling result of the crowdsourcing task which is completed by the crowdsourcing work unit to obtain the accuracy of the labeling result of the crowdsourcing work unit.

Optionally, the majority element finding algorithm comprises a majority voting algorithm.

Optionally, verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data includes:

and comparing the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data, and judging whether the redundant data is right or wrong.

Optionally, processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of the crowdsourcing work unit includes:

comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold;

and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit.

Optionally, the preset threshold is set by a task publisher according to task requirements.

Optionally, the processing means includes penalizing the crowdsourcing work unit and/or deleting the annotated result of the crowdsourcing task of the crowdsourcing work unit.

Optionally, n is an integer in the range of 2-5.

According to another aspect of the embodiments of the present invention, there is also provided a crowdsourcing quality control apparatus, including:

the system comprises a redundant data generation module, a crowdsourcing task module and a crowdsourcing task module, wherein the redundant data generation module is suitable for randomly extracting data from all crowdsourcing tasks according to a first specified proportion and copying n parts of the extracted data, the extracted data and the copied data serve as redundant data, data which are not extracted serve as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units;

the crowdsourcing task allocation module is suitable for allocating a total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks allocated by the same crowdsourcing work unit do not contain repeated redundant data;

the data answer obtaining module is suitable for obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data in all the distributed tasks after all the distributed tasks are labeled;

the redundant data checking module is suitable for checking the marking result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the marking result of each crowdsourcing work unit; and

and the labeling result processing module is suitable for processing the labeling results of the crowdsourcing tasks completed by each crowdsourcing work unit according to the accuracy of the labeling results of the crowdsourcing work units.

Optionally, the crowdsourcing task allocation module is further adapted to:

placing the redundant data and the non-redundant data into a data pool;

Optionally, the crowdsourcing task allocation module is further adapted to:

when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can always receive the data from the data pool for marking until the data pool does not contain the data which is not marked by the crowdsourcing work unit.

Optionally, the apparatus further comprises:

and the sampling checking module is suitable for sampling checking the labeling result of the crowdsourcing task completed by the crowdsourcing work unit which does not receive the redundant data to obtain the accuracy of the labeling result of the crowdsourcing work unit.

Optionally, the redundant data check module is further adapted to:

Optionally, the annotation result processing module is further adapted to:

Optionally, n is an integer in the range of 2-5.

According to yet another aspect of embodiments of the present invention, there is also provided a computer storage medium having stored thereon computer program code which, when run on a computing device, causes the computing device to execute a crowdsourcing quality control method according to any one of the above.

According to still another aspect of the embodiments of the present invention, there is also provided a computing device including:

a processor; and

a memory storing computer program code;

the computer program code, when executed by the processor, causes the computing device to perform a crowdsourcing quality control method according to any one of the above.

According to the crowdsourcing quality control method and device provided by the embodiment of the invention, data is randomly extracted from all crowdsourcing tasks according to a first specified proportion, a plurality of copies of the extracted data are copied to serve as redundant data, and one copy of the data which is not extracted serves as non-redundant data. And then, distributing the total task consisting of redundant data and non-redundant data to a crowdsourcing work unit for marking, wherein the task distributed by the same crowdsourcing work unit is ensured not to contain repeated redundant data. And then, after the crowdsourcing task is completed, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the accuracy of each crowdsourcing work unit on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data, and correspondingly processing the labeling result of the crowdsourcing task completed according to the overall completion condition. By the method, unsupervised task crowdsourcing quality control is realized on the premise of only slightly increasing the task quantity, so that the capital cost and the labor cost of crowdsourcing tasks are greatly reduced.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 shows a flow diagram of a crowdsourcing quality control method according to an embodiment of the invention;

FIG. 2 illustrates a flow diagram of a crowdsourcing quality control method according to one embodiment of the invention;

FIG. 3 illustrates a flow diagram of a crowdsourcing quality control method according to another embodiment of the invention; and

fig. 4 is a schematic structural diagram of a crowdsourcing quality control device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

To solve the above technical problem, an embodiment of the present invention provides a method for controlling crowdsourcing quality. Fig. 1 shows a flow chart of a crowdsourcing quality control method according to an embodiment of the invention. Referring to fig. 1, the method may include at least the following steps S102 to S110.

Step S102, randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying n parts of the extracted data, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units.

And step S104, distributing the total task consisting of the non-redundant data and the redundant data to a crowdsourcing work unit for marking, wherein the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data.

And step S106, after all the distributed tasks are labeled, obtaining the final answer of the redundant data through an algorithm for searching most elements according to the labeling result of the redundant data in all the distributed tasks.

And S108, verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit.

And step S110, processing the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of each crowdsourcing work unit.

The crowdsourcing quality control method provided by the embodiment of the invention comprises the steps of randomly extracting data from all crowdsourcing tasks according to a first specified proportion, copying a plurality of parts of the extracted data to be used as redundant data, and keeping one part of the data which is not extracted to be used as non-redundant data. And then, distributing the total task consisting of redundant data and non-redundant data to a crowdsourcing work unit for marking, wherein the task distributed by the same crowdsourcing work unit is ensured not to contain repeated redundant data. And then, after the crowdsourcing task is completed, obtaining a final answer of the redundant data through an algorithm for searching a plurality of elements according to the labeling result of the redundant data, and verifying the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the accuracy of each crowdsourcing work unit on the redundant data. And finally, judging the overall completion condition of the crowdsourcing task according to the accuracy of each crowdsourcing work unit on the redundant data, and correspondingly processing the labeling result of the crowdsourcing task completed according to the overall completion condition. By the method, unsupervised task crowdsourcing quality control is realized on the premise of only slightly increasing the task quantity, so that the capital cost and the labor cost of crowdsourcing tasks are greatly reduced.

In the above step S102, the extraction ratio of the data (i.e., the first specified ratio) and the number of copies (n) are mainly determined by the degree of difficulty of the crowdsourcing task and the capability level of the crowdsourcing work unit. In general, the extraction ratio is set according to the budget of the crowdsourcing task and the total number of tasks, for example, the extraction ratio is set to 10%. The number of copies n is preferably an integer in the range of 2 to 5, taking into consideration both cost and quality factors.

In the above step S104, the step of allocating the overall task composed of the non-redundant data and the redundant data to the crowdsourced work unit may be implemented by the following two allocation manners.

The first distribution mode: in the tasks allocated to each crowdsourcing work unit, redundant data and non-redundant data are put in a second designated proportion, wherein the second designated proportion is determined by the total number of crowdsourcing tasks, the first designated proportion and the number n of copies.

The first formulation is described below with reference to a specific example.

Assuming that the total crowdsourcing task includes 1000 pieces of data, the estimated number of crowdsourcing work units is 4, 10% of the data (i.e., 100 pieces of data) is extracted, and the extracted 100 pieces of data are copied into 3 copies, so that 400 pieces of redundant data and 900 pieces of non-redundant data are obtained. When task allocation is performed, 900 pieces of non-redundant data are divided into 20 jobs (jobs), and each job contains 45 pieces of non-redundant data. Since only 100 of the 400 redundant data are valid data, and the remaining 300 redundant data are duplicate data, in order to ensure that there is no duplicate data in each job, the 20 jobs are divided into 4 shares, each share includes 5 jobs, thus, every 100 non-duplicate data in the 400 redundant data are distributed to 5 different jobs, and 20 redundant data are put into each job. Finally, each job contains 65 pieces of data, of which 20 are redundant data. Next, the crowdsourced work units (i.e., the tagging units) can get jobs, each crowdsourced work unit gets one job at a time, wherein each job taken by the same crowdsourced work unit cannot have repeated data therebetween, and each crowdsourced work unit can only take 5 jobs at most, thereby ensuring that all jobs taken by the same crowdsourced work unit do not contain repeated redundant data.

It should be noted that the above examples are only illustrative and are not to be construed as limiting the present invention.

The second distribution mode: redundant data and non-redundant data are placed into a pool of data. When the labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool at each time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data, and therefore the task distributed by the same crowdsourcing work unit cannot contain repeated redundant data.

Under the second distribution formula, in order to ensure the task completion degree of crowdsourcing, when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can always receive the data from the data pool for marking until no more data which are not marked by the crowdsourcing work unit exists in the data pool, so that on the premise that the task distributed by the same crowdsourcing work unit does not contain repeated redundant data, the task completion rate of each crowdsourcing work unit is improved to the maximum extent, and the extra management cost caused by the excessive number of crowdsourcing work units is reduced.

In step S106, after all crowdsourcing tasks are labeled, the final answer of the redundant data is obtained from the labeling result of the redundant data by an algorithm that finds most elements. The algorithms mentioned herein for finding most elements may include induction, sorting, finding intermediate elements, and the like.

Preferably, the above-mentioned algorithm for finding majority elements is a majority voting algorithm.

The Majority voting Algorithm (Majority Vote), also called mole voting Algorithm (Boyer-moore Vote), is an Algorithm that finds the most elements in a sequence of elements, with a linear temporal complexity O (n) and a linear spatial complexity O (1). It is a typical flow algorithm that can quickly calculate the number that appears in an array more than half times, i.e. the majority (maj authority).

In step S108, the labeling result of the redundant data in the task of each crowdsourcing work unit is checked according to the obtained final answer of the redundant data, so as to obtain the accuracy of each crowdsourcing work unit on the redundant data.

In an alternative embodiment, the step of checking the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data may be implemented as follows:

And then, the obtained accuracy of each crowdsourcing work unit on the redundant data is used as the accuracy of the labeling result of each crowdsourcing work unit, so that the overall completion condition of the crowdsourcing task of each crowdsourcing work unit is judged according to the accuracy.

In addition, under the second distribution scheme described above, there may be situations where some crowdsourced work units do not take in redundant data due to the randomness of the data distribution. Therefore, in a preferred embodiment of the present invention, for a crowdsourcing work unit that does not receive redundant data, sampling check may be performed on the labeling result of the crowdsourcing task that is completed to obtain the accuracy of the labeling result of the crowdsourcing work unit, and the overall completion of the crowdsourcing task of the crowdsourcing work unit may be determined based on the accuracy. Through the combination of automatic verification and sampling verification, the overall crowdsourcing marking quality is verified and evaluated by a small amount of data, so that the marking cost is saved, and the control effect on the crowdsourcing task completion quality is ensured.

The presentation manner of the accuracy data obtained by verifying the annotation result according to the embodiment of the present invention may typically be shown in the following table:

#	the current renService processing unit	Number of completions	Current batch accuracy
				1	AA3340	306	86.67％(65/75)
2	BB666	276	91.89％(68/74)
				3	CC927	280	90.36％(75/83)
4	DD1998	290	87.5％(56/64)
				5	EE233	88	85.71％(24/28)
6	FF87	327	87.21％(75/86)
				7	GG959	271	84.72％(61/72)
8	HH595	238	90％(45/50)
				9	II1025	469	92.31％(96/104)
10	JJ0618	269	84.29％(59/70)

It should be noted that the data in the above table are only illustrative and should not be construed as limiting the invention.

In the step S110, the labeling result of the crowdsourcing task completed by each crowdsourcing work unit is processed according to the accuracy of the obtained labeling result of the crowdsourcing work unit, so as to ensure the control of the completion quality of the crowdsourcing task.

In an alternative embodiment, step S110 may be implemented as the following steps:

firstly, the accuracy of the labeling result of each crowdsourcing work unit is compared with a preset threshold.

And then, if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, executing a processing measure on the crowdsourcing work unit. Wherein the processing measure may include penalizing the crowdsourcing work unit and/or deleting the annotation result of the crowdsourcing task of the crowdsourcing work unit.

Preferably, the preset threshold value can be self-defined by the task publisher according to the task requirement, so that the flexibility of crowdsourcing tasks is improved.

In the above, various implementation manners of each link of the embodiment shown in fig. 1 are introduced, and the implementation process of the crowdsourcing quality control method of the present invention will be described in detail through specific embodiments.

Example one

Fig. 2 shows a flow chart of a crowdsourcing quality control method according to an embodiment of the invention. Referring to fig. 2, the method may include the following steps S202 to S212.

Step S202, randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying n parts of the extracted data, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units.

And step S204, redundant data and non-redundant data are put in the tasks distributed to each crowdsourcing work unit according to a second designated proportion, wherein the second designated proportion is determined by the total number of crowdsourcing tasks, the first designated proportion and the copy number n, and the tasks distributed by the same crowdsourcing work unit do not contain repeated redundant data.

And step S206, after all the distributed tasks are labeled, obtaining the final answer of the redundant data through a majority voting algorithm according to the labeling result of the redundant data in all the distributed tasks.

Step S208, comparing and checking the labeling result of the redundant data in the task of each crowdsourcing work unit with the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and taking the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit.

Step S210, comparing the accuracy of the labeling result of each crowdsourcing work unit with a preset threshold.

In step S212, if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, punishing the crowdsourcing work unit and/or deleting the labeling result of the crowdsourcing task of the crowdsourcing work unit.

The crowdsourcing quality control method provided by the embodiment of the invention realizes unsupervised task crowdsourcing quality control on the premise of only slightly increasing the task quantity, thereby greatly reducing the capital cost and the labor cost of crowdsourcing tasks.

Example two

Fig. 3 shows a flow chart of a crowdsourcing quality control method according to another embodiment of the invention. Referring to fig. 3, the method may include the following steps S302 to S314.

Step S302, randomly extracting data from all crowdsourcing tasks according to a first designated proportion, and copying n parts of the extracted data, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and less than the number of crowdsourcing work units.

Step S304, putting redundant data and non-redundant data into a data pool; when labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool every time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.

And step S306, after all the data in the data pool are labeled, obtaining the final answer of the redundant data through a majority voting algorithm according to the labeling result of the redundant data received by all the crowdsourcing working units.

Step S308, comparing and checking the marking result of the received redundant data with the final answer of the redundant data of the crowdsourcing work unit for the crowdsourcing work unit receiving the redundant data to obtain the correct rate of the crowdsourcing work unit on the redundant data, and taking the correct rate of the crowdsourcing work unit on the redundant data as the correct rate of the marking result of the crowdsourcing work unit.

And step S310, for the crowdsourcing work unit which does not receive the redundant data, sampling and checking the labeling result of the crowdsourcing task which is completed by the crowdsourcing work unit to obtain the accuracy of the labeling result of the crowdsourcing work unit.

In step S312, the accuracy of the labeling result of each crowdsourcing unit is compared with a preset threshold.

Step S314, if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than the preset threshold, performing a penalty on the crowdsourcing work unit and/or deleting the labeling result of the crowdsourcing task of the crowdsourcing work unit.

The crowdsourcing quality control method provided by the embodiment of the invention randomly distributes redundant data and non-redundant data through the data pool, thereby reducing the complexity of task distribution. Meanwhile, through the combination of automatic verification and sampling verification, the overall crowdsourcing marking quality is verified and evaluated by a small amount of data, so that the marking cost is saved, and the control effect on the crowdsourcing task completion quality is ensured.

It should be noted that, in practical applications, all the above optional embodiments may be combined in a combined manner at will to form an optional embodiment of the present invention, and details are not described here any more.

Based on the same inventive concept, the embodiment of the present invention further provides a crowdsourcing quality control device, which is used for supporting the crowdsourcing quality control method provided by any one of the above embodiments or a combination thereof. Fig. 4 is a schematic structural diagram of a crowdsourcing quality control device according to an embodiment of the invention. Referring to fig. 4, the apparatus may include at least: the system comprises a redundant data generation module 410, a crowdsourcing task allocation module 420, a data answer acquisition module 430, a redundant data checking module 440 and a labeling result processing module 450.

The functions of the components or devices of the crowdsourcing quality control device of the embodiment of the invention and the connection relationship among the parts are described:

and the redundant data generation module 410 is suitable for randomly extracting data from all crowdsourcing tasks according to a first specified proportion and copying the extracted data into n parts, wherein the extracted data and the copied data are used as redundant data, the data which is not extracted is used as non-redundant data, and n is an integer which is greater than or equal to 2 and is less than the number of crowdsourcing work units.

And the crowdsourcing task allocation module 420 is connected with the redundant data generation module 410 and is adapted to allocate a total task consisting of non-redundant data and redundant data to a crowdsourcing work unit for labeling, wherein the tasks allocated by the same crowdsourcing work unit do not contain repeated redundant data.

And the data answer obtaining module 430 is connected to the crowdsourcing task allocating module 420, and is adapted to obtain a final answer of the redundant data by an algorithm for searching a majority element according to a labeling result of the redundant data in all allocated tasks after all allocated tasks are labeled.

The redundant data checking module 440 is connected to the data answer obtaining module 430, and is adapted to check the labeling result of the redundant data in the task of each crowdsourcing work unit according to the final answer of the redundant data to obtain the correct rate of each crowdsourcing work unit on the redundant data, and use the correct rate of each crowdsourcing work unit on the redundant data as the correct rate of the labeling result of each crowdsourcing work unit.

And the labeling result processing module 450 is connected with the redundant data checking module 440 and is adapted to process the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the accuracy of the labeling result of the crowdsourcing work unit.

In an alternative embodiment, the crowdsourcing task allocation module 420 is further adapted to:

in the tasks allocated to each crowdsourcing work unit, redundant data and non-redundant data are put in a second designated proportion, wherein the second designated proportion is determined by the total number of crowdsourcing tasks, the first designated proportion and the number n of copies.

In another alternative embodiment, the crowdsourcing task allocation module 420 is further adapted to:

putting redundant data and non-redundant data into a data pool;

when labeling is carried out, each crowdsourcing work unit draws one piece of data from the data pool every time for labeling until all the data in the data pool are labeled, wherein the same crowdsourcing work unit cannot draw repeated data.

Further, to ensure the task completion of crowdsourcing, the crowdsourcing task allocation module 420 is further adapted to:

when each crowdsourcing work unit receives data from the data pool for marking, each crowdsourcing work unit can receive the data from the data pool for marking until the data pool does not contain the data which is not marked by the crowdsourcing work unit.

Further, in order to ensure the effect of controlling the quality of the completion of the crowdsourcing task, considering that when each crowdsourcing work unit receives data from the data pool for marking, due to the randomness of data allocation, some crowdsourcing work units may not receive redundant data, the crowdsourcing quality control device may further include:

In an alternative embodiment, the above-mentioned algorithm for finding majority elements comprises a majority voting algorithm.

In an alternative embodiment, the redundant data check module 440 is further adapted to:

In an alternative embodiment, the annotation result processing module 450 is further adapted to:

and if the accuracy of the labeling result of a certain crowdsourcing work unit is lower than a preset threshold, executing a processing measure on the crowdsourcing work unit. Wherein the processing measure may include penalizing the crowdsourcing work unit and/or deleting the annotation result of the crowdsourcing task of the crowdsourcing work unit.

Preferably, the preset threshold is set by the task publisher according to task requirements.

In an alternative embodiment, the number of copies n is an integer in the range of 2 to 5.

Based on the same inventive concept, the embodiment of the invention also provides a computer storage medium. The computer storage medium stores computer program code which, when run on a computing device, causes the computing device to perform a crowdsourcing quality control method according to any one or combination of the above embodiments.

Based on the same inventive concept, the embodiment of the invention also provides the computing equipment. The computing device may include:

a processor; and

a memory storing computer program code;

the computer program code, when executed by a processor, causes the computing device to perform a crowdsourcing quality control method according to any one or combination of the above embodiments.

According to any one or a combination of multiple optional embodiments, the embodiment of the present invention can achieve the following advantages:

It is clear to those skilled in the art that the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.

In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.

Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: u disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program code.

Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a computing device, e.g., a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.

According to an aspect of the embodiments of the present invention, a1. a crowdsourcing quality control method is provided, including:

A2. The method of a1, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit comprises:

A3. The method of a1 or a2, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit comprises:

placing the redundant data and the non-redundant data into a data pool;

A4. The method according to any one of a1-A3, wherein, when each crowdsourced work unit gets data from the data pool for annotation, each crowdsourced work unit is enabled to get data from the data pool for annotation until there is no more data in the data pool that the crowdsourced work unit has not been annotated.

A5. The method of any one of a1-a4, further comprising:

A6. The method of any one of a1-a5, wherein the algorithm to find majority elements comprises a majority voting algorithm.

A7. The method according to any one of A1-A6, wherein checking labeling results of redundant data in tasks of each crowd-sourced work unit according to final answers to the redundant data comprises:

A8. The method according to any one of A1-A7, wherein the processing of the labeling result of the crowdsourcing task completed by each crowdsourcing work unit according to the correctness of the labeling result of the crowdsourcing work unit comprises the following steps:

A9. The method of any of a1-A8, wherein the preset threshold is set by a task publisher according to task requirements.

A10. The method of any one of a1-a9, wherein the processing means includes penalizing the crowd-sourced work unit and/or deleting annotated results of the crowd-sourced task for the crowd-sourced work unit.

A11. The method of any one of a1-a10, wherein n is an integer in the range of 2-5.

According to another aspect of the embodiments of the present invention, there is also provided b12. a crowdsourcing quality control apparatus, including:

B13. The apparatus of B12, wherein the crowdsourcing task allocation module is further adapted to:

B14. The apparatus of B12 or B13, wherein the crowdsourced task allocation module is further adapted to:

placing the redundant data and the non-redundant data into a data pool;

B15. The apparatus of any one of B12-B14, wherein the crowdsourced task allocation module is further adapted to:

B16. The apparatus of any one of B12-B15, further comprising:

B17. The apparatus of any one of B12-B16, wherein the majority element finding algorithm comprises a majority voting algorithm.

B18. The apparatus of any one of B12-B17, wherein the redundant data check module is further adapted to:

B19. The apparatus of any one of B12-B18, wherein the annotation result processing module is further adapted to:

B20. The apparatus of any one of B12-B19, wherein the preset threshold is set by a task publisher according to task requirements.

B21. The apparatus of any one of B12-B20, wherein the processing means includes penalizing the crowd-sourced work unit and/or deleting annotated results of the crowd-sourced task for the crowd-sourced work unit.

B22. The apparatus of any one of B12-B21, wherein n is an integer in the range of 2-5.

According to yet another aspect of embodiments of the present invention, there is also provided c23 a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the crowdsourcing quality control method according to any one of a1-a 11.

There is also provided, in accordance with yet another aspect of an embodiment of the present invention, apparatus for computing, comprising:

a processor; and

a memory storing computer program code;

the computer program code, when executed by the processor, causes the computing device to perform the crowdsourcing quality control method according to any one of a1-a 11.

Claims

1. A method of crowd sourcing quality control, comprising:

2. The method of claim 1, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced unit of work comprises:

3. The method of claim 1 or 2, wherein assigning an overall task consisting of the non-redundant data and the redundant data to a crowd-sourced work unit comprises:

placing the redundant data and the non-redundant data into a data pool;

4. The method of any one of claims 1-3, wherein each crowdsourced work unit is enabled to always get data from the data pool for tagging when it gets data from the data pool for tagging until there is no more data in the data pool that the crowdsourced work unit has not tagged.

5. The method according to any one of claims 1-4, further comprising:

6. The method of any of claims 1-5, wherein the majority element finding algorithm comprises a majority voting algorithm.

7. The method of any one of claims 1-6, wherein checking the labeling result of the redundant data in the task of each crowd-sourced work unit according to the final answer to the redundant data comprises:

8. A crowdsourcing quality control device comprising:

9. A computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the crowdsourcing quality control method of any one of claims 1-7.

10. A computing device, comprising:

a processor; and

a memory storing computer program code;

the computer program code, when executed by the processor, causes the computing device to perform the crowdsourcing quality control method of any of claims 1-7.