CN108197202B - Data verification method and device for crowdsourcing task, server and storage medium - Google Patents

Data verification method and device for crowdsourcing task, server and storage medium Download PDF

Info

Publication number
CN108197202B
CN108197202B CN201711457649.6A CN201711457649A CN108197202B CN 108197202 B CN108197202 B CN 108197202B CN 201711457649 A CN201711457649 A CN 201711457649A CN 108197202 B CN108197202 B CN 108197202B
Authority
CN
China
Prior art keywords
answers
answer
users
user
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711457649.6A
Other languages
Chinese (zh)
Other versions
CN108197202A (en
Inventor
黄翠萍
柯海帆
李亚丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201711457649.6A priority Critical patent/CN108197202B/en
Publication of CN108197202A publication Critical patent/CN108197202A/en
Application granted granted Critical
Publication of CN108197202B publication Critical patent/CN108197202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a data verification method, a device, a server and a storage medium for crowdsourcing tasks, wherein the method comprises the following steps: distributing the same crowdsourcing task to a plurality of users to perform data acquisition operation; obtaining answers of the plurality of users; and performing answer verification according to the proportion of the same answers in the answers of the plurality of users, and determining the final answer of the crowdsourcing task. The embodiment of the invention distributes the same crowdsourcing task to a plurality of users to obtain a plurality of answers, and utilizes the plurality of answers to carry out self-check to check whether the answers are correct to obtain the final answers of the crowdsourcing tasks, thereby improving the accuracy and efficiency of data check, reducing the cost of manual check, and solving the problems of low efficiency and low accuracy of the existing data check process.

Description

Data verification method and device for crowdsourcing task, server and storage medium
Technical Field
The embodiment of the invention relates to a data verification technology, in particular to a data verification method, a device, a server and a storage medium for crowdsourcing tasks.
Background
With the continuous development of the internet, more and more attention is paid to data collection by using a field crowdsourcing mode, and the data collection process involves content extraction and data verification (also called data verification, and aims to confirm whether the data is correct or not). Because the data volume is huge, the manual mode is adopted for data auditing, the period is long, the manpower investment is large, the cost is high, the mobility of personnel is large, and a large amount of data backlog is easily caused.
For example, regarding the Point of Interest (POI) attribute such as whether a certain place in the picture can be communicated with a car, the machine cannot extract such information from the picture, and cannot perform the verification whether the data is correct.
Disclosure of Invention
The embodiment of the invention provides a data verification method, a data verification device, a server and a storage medium for crowdsourcing tasks, which are used for improving the efficiency and accuracy of data verification and reducing the manual auditing cost.
In a first aspect, an embodiment of the present invention provides a data verification method for a crowdsourcing task, including:
distributing the same crowdsourcing task to a plurality of users to perform data acquisition operation;
obtaining answers of the plurality of users;
and performing answer verification according to the proportion of the same answers in the answers of the plurality of users, and determining the final answer of the crowdsourcing task.
In a second aspect, an embodiment of the present invention further provides a data verification apparatus for crowdsourcing tasks, including:
the task allocation module is used for allocating the same crowdsourcing task to a plurality of users to carry out data acquisition operation;
the answer obtaining module is used for obtaining answers of the users;
and the answer checking module is used for checking answers according to the proportion of the same answers in the answers of the plurality of users and determining the final answer of the crowdsourcing task.
In a third aspect, an embodiment of the present invention further provides a server, where the server includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data verification method for crowdsourcing tasks in accordance with any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data verification method for a crowdsourcing task according to any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, the same crowdsourcing task is distributed to a plurality of users to obtain a plurality of answers, and the plurality of answers are utilized for self-checking to check whether the answers are correct to obtain the final answers of the crowdsourcing task, so that the accuracy and efficiency of data checking are improved, the cost of manual checking is reduced, and the problems of low efficiency and low accuracy of the conventional data checking process are solved. And moreover, the preset stub questions are used for carrying out validity check on the answers of the users, the answers of the users which are not credible are removed, the data quality is improved, and the accuracy of the final answers of the crowdsourcing tasks is further improved.
Drawings
Fig. 1 is a flowchart of a data checking method for crowdsourcing tasks according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data checking method for crowdsourcing tasks according to a second embodiment of the present invention;
fig. 3 is a specific flowchart of a data checking method for crowdsourcing task according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data checking apparatus for crowdsourcing task according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a data verification method for a crowdsourcing task according to an embodiment of the present invention, where this embodiment is applicable to a case of verifying data collected by the crowdsourcing task, and the method may be performed by a data verification apparatus for the crowdsourcing task, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a server. As shown in fig. 1, the method specifically includes:
and S110, distributing the same crowdsourcing task to a plurality of users to perform data acquisition operation.
The number of users allocated by the same crowdsourcing task can be set according to requirements. Multiple topics are typically included in a crowdsourcing task. And (3) carrying out data acquisition operation by the user, namely carrying out operation on each topic in the crowdsourcing task, giving an answer, wherein the answer is data acquired by the crowdsourcing task. When the crowdsourcing tasks are distributed, the crowdsourcing tasks can be actively distributed according to information such as user IDs, user IP addresses and user history records, and the crowdsourcing tasks are pushed to users meeting conditions; or a crowdsourcing task is put on a crowdsourcing platform for a user to get the task. For example, the topic of the crowdsourcing task may be to extract attribute information such as phone, address, time, and the like from a picture.
And S120, obtaining answers of the plurality of users.
The method comprises the steps of obtaining answers of a plurality of users, namely receiving answers uploaded by the users, wherein each user uploads one answer.
S130, performing answer verification according to the proportion of the same answers in the answers of the multiple users, and determining the final answer of the crowdsourcing task.
For the same crowdsourcing task, multiple answers given by multiple users are obtained, self-checking (also called cross-checking) is performed by using the multiple answers, and a final answer of the crowdsourcing task can be determined. The self-checking process does not need manual participation, and the labor cost is reduced. For example, five answers are required to cross-check, and a crowdsourcing task may be assigned to five users.
According to the technical scheme, the same crowdsourcing task is distributed to multiple users to obtain multiple answers, the multiple answers are utilized for self-checking to check whether the answers are correct, and the final answers of the crowdsourcing tasks are obtained, so that the accuracy and the efficiency of data checking are improved, the cost of manual checking is reduced, and the problems that the existing data checking process is low in efficiency and accuracy are low are solved.
On the basis of the above technical solution, S130 may include: for each topic in the crowdsourcing task, determining the same answer proportion of the topic according to the answers of the plurality of users; if the proportion of the same answers of the questions exceeds a preset threshold value, determining the same answers as the final answers of the questions; and if the proportion of the same answers of the questions does not exceed the preset threshold value, submitting the answers of the questions to the multiple users for manual verification.
The crowdsourcing task generally comprises a plurality of topics, and for each topic, cross-checking of a plurality of answers is performed to determine a final answer of each topic in the crowdsourcing task. The cross check refers to comparing multiple answers, and using the same answer meeting the proportion requirement in the multiple answers as the final answer. The preset threshold can be set according to actual requirements, for example, set to 50%, that is, the proportion of the same answer to a topic exceeds 50%, the same answer is considered as the final answer of the topic.
If the proportion of the same answers of the questions does not exceed the preset threshold value, manually checking a plurality of answers of the questions, wherein the manual checking means that when the answers cannot be obtained through cross checking, a plurality of answers of the same question are manually compared to determine the final answer of the question, and the method can be understood as an auxiliary checking measure. It should be noted that in the embodiment of the present invention, automatic auditing of most data can be achieved through cross-checking, and therefore, the probability that manual-assisted checking is required is also relatively small. On the basis of cross checking, manual auxiliary checking is set, and comprehensiveness and accuracy of data checking can be further guaranteed.
Optionally, the topic types in the crowdsourcing task include: at least one of the selection of questions, judgment of questions, question answering and blank filling.
In order to reduce the operation difficulty of the crowdsourcing task and the data verification difficulty, the question can be selected and judged as the main question type of the crowdsourcing task, and the task answers of the users are ensured to have very simple processing specifications and standards. For question types such as question answering and blank filling questions, when the answers of different users are compared, keywords in the answers need to be extracted, and whether the answers of the different users are the same or not is determined by judging the similarity of the keywords. The specific similarity can be set according to different scenes, for example, the description information of the picture is filled in, the description phrases of different users are different, and if the similarity is set to be greater than 90%, the answers are considered to be the same; for another example, if an address or a telephone is extracted from a picture, the answer is considered to be the same in the case that the similarity is 100% because a specific number of a house number or a telephone number is involved.
Example two
Fig. 2 is a flowchart of a data verification method for a crowdsourcing task according to a second embodiment of the present invention, in this embodiment, based on the foregoing embodiments, a stub verification operation is added in S130 to remove an answer of an untrusted user, improve data quality, and perform verification according to the answer of a trusted user. The stake point check is to judge whether the user is credible (namely whether the answer of the user is valid) according to a preset stake topic in the crowdsourcing task. As shown in fig. 2, the method specifically includes:
and S210, distributing the same crowdsourcing task to a plurality of users to perform data acquisition operation.
S220, obtaining answers of the multiple users.
And S230, determining a credible user in the plurality of users according to a preset stub topic in the crowdsourcing task.
The preset stub title is used for verifying whether the user is credible, namely whether the answer given by the user is valid or not and whether the answer is counterfeit data or not. The preset blind pile questions can be repeatedly put in and applied to a plurality of crowdsourcing tasks. Before the same crowdsourcing task is distributed to a plurality of users for data acquisition operation, a preset stub topic can be set in the crowdsourcing task, wherein the preset stub topic comprises a preset number of topics with standard answers. The blind stub questions are generally questions that all users achieve the same cognition and have definite answers, and are generally selected in a simple and easy-to-compare question type, such as choice questions. The number of the stub questions can be set according to specific situations, for example, 5 stub questions are set.
In one embodiment, whether the user is authentic may be determined by determining whether the user's answer to the stub question is the same as the standard answer. Specifically, S230 includes: for each user in the plurality of users, extracting an answer of the preset stub title from the answers of the user; and if the extracted answer is the same as the standard answer of the preset stub topic, determining that the user is a credible user. If at least one of the extracted answers is different from the standard answer of the preset stub topic, determining that the user is not a trusted user, that is, the user may forge data, the user cannot trust the answers of all the topics in the crowdsourcing task, and subsequent cross-checking cannot use the answer of the user.
It should be noted that, in addition to determining whether the answer given by the user is authentic by using whether the answer of the preset stub question is accurate, it is also possible to determine whether the answer given by the user is authentic by using the operation time of the user on the preset stub question. Specifically, the actual operation time of the user on the stub topic is compared with the preset operation time of the stub topic, and if the actual operation time of at least one stub topic is far shorter than the preset operation time (for example, the actual operation time is shorter than half of the preset operation time), the user is considered to be not credible. The preset operation time can be obtained according to the operation capacity of most users for the topic. Illustratively, the preset operation time of the dark pile question is 20 seconds, and the user submits the answer of the question in only 5 seconds, and then the user is considered to be not credible. Of course, whether the user is credible or not can be judged by combining the standard answer of the preset stub topic and the preset operation time, and if the answer of the user to the stub topic meets the standard answer and the actual operation time meets the preset operation time, the user is considered credible.
S240, if the number of the credible users does not reach the preset number, distributing the crowdsourcing tasks.
The preset number is the number of the users participating in cross check, which is set in advance. In S210, the crowdsourcing task may be allocated to the corresponding number of users according to the preset number, or may be allocated to more users. For example, five answers are needed for cross-checking, a crowdsourcing task may be assigned to five users, or a crowdsourcing task may be assigned to seven users, so as to reduce the influence of an untrusted user on the cross-checking process.
If the number of the credible users does not reach the preset number, the cross check cannot be started, the crowdsourcing task needs to be redistributed, so that the answers of other users to the crowdsourcing task are obtained, and the stake point check is carried out.
And S250, if the number of the credible users reaches the preset number, performing answer verification according to the proportion of the same answers in the answers of the credible users, and determining the final answer of the crowdsourcing task. For the specific answer checking process, reference is made to the description of the first embodiment, which is not repeated herein.
According to the technical scheme, the same crowdsourcing task is distributed to a plurality of users to obtain a plurality of answers, the answers of the users are checked for validity by using the preset stub questions, the answers of the users which are not credible are removed, the data quality is improved, and the accuracy of the final answers of the crowdsourcing tasks is further improved.
In one embodiment, the preset stub title may be updated periodically. The problem of the blind pile is updated regularly, so that the problem of the blind pile is not regular and can be circulated, and the problem that a user counterfeits data after knowing the problem of the blind pile is avoided.
The data verification process of the present embodiment is explained below with reference to fig. 3. And before distributing the crowdsourcing tasks, embedding preset stub topics in the crowdsourcing tasks. As shown in fig. 3, the method includes:
s310, receiving a crowdsourcing task answer submitted by a user.
S320, obtaining the answer of the stub topic from the answer of the user.
S330, comparing the answer of the blind pile question of the user with the standard answer of the blind pile question, and judging whether the pile point check is passed or not. If yes, the user is credible, and the step S340 is entered; if not, the user is not trusted and the process goes to S350.
S340, the answer of the crowdsourcing task given by the user is valid, and a cross check waiting procedure is performed. When the number of effective answers reaches the preset number (namely the number of the users reaches the preset number), the crowdsourcing task stops opening, and cross checking is started.
And S350, the answers given by the user to the crowdsourcing task are invalid, and if the valid answers do not reach the preset number of copies, the crowdsourcing task needs to be distributed again.
S360, judging whether the proportion of the same answers in the answers of the users reaches a preset threshold value, taking 50% as an example in the embodiment, namely judging whether more than half of the answers of the users are the same, if so, entering S370; if not, S380 is entered.
And S370, finishing the cross checking treatment, wherein the same answer of a plurality of users is the final answer of the question.
And S380, recovering the answers of the user and entering a manual auxiliary verification link.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a data checking apparatus for crowdsourcing task according to a third embodiment of the present invention, and as shown in fig. 4, the apparatus includes: a task assignment module 410, an answer acquisition module 420, and an answer verification module 430.
A task allocation module 410, configured to allocate the same crowdsourcing task to multiple users for data acquisition;
an answer obtaining module 420, configured to obtain answers of the multiple users;
the answer checking module 430 is configured to perform answer checking according to a ratio of the same answer among the answers of the multiple users, and determine a final answer of the crowdsourcing task.
Optionally, the topic types in the crowdsourcing task include: at least one of the selection of questions, judgment of questions, question answering and blank filling.
Further, the answer checking module 430 includes:
a proportion determining unit, configured to determine, for each topic in the crowdsourcing task, the same answer proportion of the topic according to answers of the multiple users;
the answer determining unit is used for determining the same answer as the final answer of the question under the condition that the proportion of the same answer of the question exceeds a preset threshold value;
and the answer submitting unit is used for submitting the answers of the plurality of users to the questions to be manually verified under the condition that the same answer proportion of the questions does not exceed the preset threshold value.
Optionally, the answer checking module 430 includes: the system comprises a trusted user determining unit and an answer verifying unit;
a trusted user determining unit, configured to determine, according to a preset stub topic in the crowdsourcing task, a trusted user in the multiple users;
the task allocation module 410 is further configured to allocate the crowdsourcing tasks when the number of the trusted users does not reach a preset number;
and the answer checking unit is used for checking answers according to the proportion of the same answers in the answers of the credible users under the condition that the number of the credible users reaches the preset number, and determining the final answer of the crowdsourcing task.
The trusted user determination unit includes:
an answer extracting subunit, configured to, for each user of the multiple users, extract an answer to the preset stub topic from the answers of the user;
and the credible user determining subunit is used for determining that the user is the credible user under the condition that the extracted answer is the same as the standard answer of the preset stub topic.
Optionally, the apparatus further comprises: the device comprises a blind pile question setting module and a blind pile question setting module, wherein the blind pile question setting module is used for setting a preset blind pile question in a crowdsourcing task before distributing the same crowdsourcing task to a plurality of users to perform data acquisition operation, and the preset blind pile question comprises a preset number of questions with standard answers.
Further, the above apparatus may further include: and the blind pile question updating module is used for periodically updating the preset blind pile questions.
The data verification device for the crowdsourcing task, provided by the embodiment of the invention, can execute the data verification method for the crowdsourcing task, provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For details of the technique not described in detail in this embodiment, reference may be made to the data verification method for crowdsourcing task provided in any embodiment of the present invention.
Example four
The present embodiment provides a server, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data verification method for crowdsourcing tasks in accordance with any embodiment of the invention.
Fig. 5 is a schematic structural diagram of a server according to a fourth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary server 12 suitable for use in implementing embodiments of the present invention. The server 12 shown in fig. 5 is only an example, and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 5, the server 12 is in the form of a general purpose computing device. The components of the server 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
The server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by server 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The server 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The server 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the server 12, and/or with any devices (e.g., network card, modem, etc.) that enable the server 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the server 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown in FIG. 5, the network adapter 20 communicates with the other modules of the server 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the server 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, such as implementing a data verification method for crowdsourcing tasks provided by embodiments of the present invention, by running programs stored in the system memory 28.
EXAMPLE five
Fifth, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data verification method for a crowdsourcing task according to any embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (13)

1. A data verification method for crowdsourcing tasks is characterized by comprising the following steps:
distributing the same crowdsourcing task to a plurality of users meeting the conditions according to the user ID, the user IP address and the user history record to perform data acquisition operation;
obtaining answers of the plurality of users;
performing answer verification according to the proportion of the same answers in the answers of the multiple users, and determining the final answer of the crowdsourcing task;
wherein, the performing answer check according to the proportion of the same answer in the answers of the plurality of users to determine the final answer of the crowdsourcing task comprises:
determining a credible user in the plurality of users according to a preset stub topic in the crowdsourcing task; comparing the actual operation time of the user on the blind pile questions with the preset operation time of the blind pile questions, and determining that the user is not credible if the actual operation time of at least one blind pile question is lower than the preset operation time;
if the number of the credible users does not reach the preset number, distributing the crowdsourcing tasks; the preset number is at least not less than the preset number of users participating in cross check;
and if the number of the credible users reaches the preset number, performing answer verification according to the proportion of the same answers in the answers of the credible users, and determining the final answer of the crowdsourcing task.
2. The method of claim 1, wherein performing answer checking according to a proportion of identical answers in the answers of the plurality of users, and determining a final answer for the crowdsourcing task comprises:
for each topic in the crowdsourcing task, determining the same answer proportion of the topic according to the answers of the plurality of users;
if the proportion of the same answers of the questions exceeds a preset threshold value, determining the same answers as the final answers of the questions;
and if the proportion of the same answers of the questions does not exceed the preset threshold value, submitting the answers of the questions to the multiple users for manual verification.
3. The method of claim 1, wherein determining a trusted user of the plurality of users according to a preset stub topic in the crowdsourcing task comprises:
for each user in the plurality of users, extracting an answer of the preset stub title from the answers of the user;
and if the extracted answer is the same as the standard answer of the preset stub topic, determining that the user is a credible user.
4. The method of claim 1, wherein prior to assigning the same crowdsourcing task to multiple users for a data collection job, the method further comprises:
and setting a preset stub topic in the crowdsourcing task, wherein the preset stub topic comprises a preset number of topics with standard answers.
5. The method of claim 4, wherein after setting a preset stub topic in the crowdsourcing task, the method further comprises:
and updating the preset blind pile questions periodically.
6. The method of any of claims 1-5, wherein the topic types in the crowdsourcing task comprise: at least one of the selection of questions, judgment of questions, question answering and blank filling.
7. A data verification apparatus for crowdsourcing tasks, comprising:
the task allocation module is used for allocating the same crowdsourcing task to a plurality of users meeting the conditions according to the user ID, the user IP address and the user history record to perform data acquisition operation;
the answer obtaining module is used for obtaining answers of the users;
the answer checking module is used for checking answers according to the proportion of the same answers in the answers of the plurality of users and determining the final answer of the crowdsourcing task;
wherein, the answer checking module comprises: the system comprises a trusted user determining unit and an answer verifying unit;
the trusted user determining unit is used for determining a trusted user in the plurality of users according to a preset stub topic in the crowdsourcing task; comparing the actual operation time of the user on the blind pile questions with the preset operation time of the blind pile questions, and determining that the user is not credible if the actual operation time of at least one blind pile question is lower than the preset operation time;
the task allocation module is further configured to allocate the crowdsourcing tasks when the number of the trusted users does not reach a preset number; the preset number is at least not less than the preset number of users participating in cross check;
and the answer checking unit is used for checking answers according to the proportion of the same answers in the answers of the credible users under the condition that the number of the credible users reaches the preset number, and determining the final answer of the crowdsourcing task.
8. The apparatus of claim 7, wherein the answer verification module comprises:
a proportion determining unit, configured to determine, for each topic in the crowdsourcing task, the same answer proportion of the topic according to answers of the multiple users;
the answer determining unit is used for determining the same answer as the final answer of the question under the condition that the proportion of the same answer of the question exceeds a preset threshold value;
and the answer submitting unit is used for submitting the answers of the plurality of users to the questions to be manually verified under the condition that the same answer proportion of the questions does not exceed the preset threshold value.
9. The apparatus of claim 7, wherein the trusted user determination unit comprises:
an answer extracting subunit, configured to, for each user of the multiple users, extract an answer to the preset stub topic from the answers of the user;
and the credible user determining subunit is used for determining that the user is the credible user under the condition that the extracted answer is the same as the standard answer of the preset stub topic.
10. The apparatus of claim 7, further comprising:
the device comprises a blind pile question setting module and a blind pile question setting module, wherein the blind pile question setting module is used for setting a preset blind pile question in a crowdsourcing task before distributing the same crowdsourcing task to a plurality of users to perform data acquisition operation, and the preset blind pile question comprises a preset number of questions with standard answers.
11. The apparatus of claim 10, further comprising:
and the blind pile question updating module is used for periodically updating the preset blind pile questions.
12. A server, characterized in that the server comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data verification method for a crowdsourcing task as recited in any one of claims 1 to 6.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a data verification method for a crowdsourcing task as claimed in any one of claims 1 to 6.
CN201711457649.6A 2017-12-28 2017-12-28 Data verification method and device for crowdsourcing task, server and storage medium Active CN108197202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711457649.6A CN108197202B (en) 2017-12-28 2017-12-28 Data verification method and device for crowdsourcing task, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711457649.6A CN108197202B (en) 2017-12-28 2017-12-28 Data verification method and device for crowdsourcing task, server and storage medium

Publications (2)

Publication Number Publication Date
CN108197202A CN108197202A (en) 2018-06-22
CN108197202B true CN108197202B (en) 2021-12-24

Family

ID=62585135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711457649.6A Active CN108197202B (en) 2017-12-28 2017-12-28 Data verification method and device for crowdsourcing task, server and storage medium

Country Status (1)

Country Link
CN (1) CN108197202B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583933B (en) * 2018-10-09 2022-07-08 顺丰科技有限公司 Address information judging method, device, equipment and storage medium thereof
CN109471943B (en) * 2018-11-12 2024-06-07 平安科技(深圳)有限公司 Crowd-sourced task answer determining method and related equipment based on data processing
CN109582581B (en) * 2018-11-30 2023-08-25 平安科技(深圳)有限公司 Result determining method based on crowdsourcing task and related equipment
CN111382144B (en) * 2018-12-27 2023-05-02 阿里巴巴集团控股有限公司 Information processing method and device, storage medium and processor
CN110096525A (en) * 2019-04-29 2019-08-06 北京三快在线科技有限公司 Calibrate method, apparatus, equipment and the storage medium of interest point information
CN110287385A (en) * 2019-06-18 2019-09-27 素朴网联(珠海)科技有限公司 A kind of corpus data acquisition method, system and storage medium
CN113268621B (en) * 2020-02-17 2024-04-30 百度在线网络技术(北京)有限公司 Picture sorting method and device, electronic equipment and storage medium
KR102195606B1 (en) * 2020-03-23 2020-12-29 주식회사 크라우드웍스 Method for improving reliability by selective self check of worker of crowdsourcing based project for artificial intelligence training data generation
KR102195964B1 (en) * 2020-03-27 2020-12-29 주식회사 크라우드웍스 Method for operating self check process of worker of crowdsourcing based project for artificial intelligence training data generation
CN111832956A (en) * 2020-07-20 2020-10-27 北京百度网讯科技有限公司 Data verification method, device, electronic equipment and medium
CN112508400B (en) * 2020-12-04 2021-10-08 云南大学 Self-generation method of crowdsourcing collaborative iteration task
CN113868538B (en) * 2021-10-19 2024-04-09 北京字跳网络技术有限公司 Information processing method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN106228294A (en) * 2016-07-18 2016-12-14 合肥赑歌数据科技有限公司 A kind of search engine evaluation system and management
CN106529521A (en) * 2016-10-31 2017-03-22 江苏文心古籍数字产业有限公司 Ancient book character digital recording method
CN107194800A (en) * 2017-05-08 2017-09-22 深圳市华傲数据技术有限公司 A kind of data verification system and method based on mass-rent
CN107273492A (en) * 2017-06-15 2017-10-20 复旦大学 A kind of exchange method based on mass-rent platform processes image labeling task

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446287A (en) * 2016-11-08 2017-02-22 北京邮电大学 Answer aggregation method and system facing crowdsourcing scene question-answering system
CN106844723B (en) * 2017-02-10 2019-09-10 厦门大学 Medical knowledge base construction method based on question answering system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN106228294A (en) * 2016-07-18 2016-12-14 合肥赑歌数据科技有限公司 A kind of search engine evaluation system and management
CN106529521A (en) * 2016-10-31 2017-03-22 江苏文心古籍数字产业有限公司 Ancient book character digital recording method
CN107194800A (en) * 2017-05-08 2017-09-22 深圳市华傲数据技术有限公司 A kind of data verification system and method based on mass-rent
CN107273492A (en) * 2017-06-15 2017-10-20 复旦大学 A kind of exchange method based on mass-rent platform processes image labeling task

Also Published As

Publication number Publication date
CN108197202A (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN108197202B (en) Data verification method and device for crowdsourcing task, server and storage medium
CN109714636B (en) User identification method, device, equipment and medium
CN107784205B (en) User product auditing method, device, server and storage medium
CN109885597B (en) User grouping processing method and device based on machine learning and electronic terminal
WO2019085466A1 (en) Association test method and system, application server, and computer readable storage medium
CN112181835B (en) Automatic test method, device, computer equipment and storage medium
CN110348471B (en) Abnormal object identification method, device, medium and electronic equipment
CN107766726A (en) Application security detection method and device
CN113177701A (en) User credit assessment method and device
EP3734568A1 (en) Data extraction method and device
KR20190094096A (en) Document information input methods, devices, servers, and storage media
CN111402034A (en) Credit auditing method, device, equipment and storage medium
CN109460226B (en) Test certificate image generation method, device, equipment and storage medium
CN111311393A (en) Credit risk assessment method, device, server and storage medium
CN111260479A (en) Financing application processing method, device, equipment and storage medium
CN113225325B (en) IP (Internet protocol) blacklist determining method, device, equipment and storage medium
CN112907040B (en) Event processing method, device, equipment and storage medium
CN115016890A (en) Virtual machine resource allocation method and device, electronic equipment and storage medium
CN114240476A (en) Abnormal user determination method, device, equipment and storage medium
CN111260512B (en) Background course autonomous distribution method and device
CN107958142B (en) User account generation method and device
CN111259689B (en) Method and device for transmitting information
CN108600113B (en) Preliminary auditing method and device for data to be issued and storage medium
CN111045823A (en) Context data load distribution method, device, equipment and storage medium
CN113891109B (en) Adaptive noise reduction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant