WO2021174829A1

WO2021174829A1 - Crowdsourced task inspection method, apparatus, computer device, and storage medium

Info

Publication number: WO2021174829A1
Application number: PCT/CN2020/118461
Authority: WO
Inventors: 王健宗; 李佳琳
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-03-02
Filing date: 2020-09-28
Publication date: 2021-09-10
Also published as: CN111460810A

Abstract

A crowdsourced task inspection method, an apparatus, a computer device, and a storage medium, applied in the field of big data databases. The method comprises: with respect to each historically crowdsourced task, obtaining each responder that participated in the historically crowdsourced tasks as well as each response answer corresponding to each responder (S1); performing parsing on the response answers, obtaining inspection tokens, extracting inspection keywords from the inspection tokens, and storing the inspection keywords in a preset answer word base (S2); with respect to the inspection tokens from each response answer of each responder, tabulating frequencies of inspection token inspection keyword hits in the preset answer word base, which serve as base frequencies corresponding to each response answer (S3), and determining a reliability value corresponding to each responder according to the base frequencies corresponding to each response answer (S4); determining inspection targets according to the reliability values, and performing evaluation operations on response answers corresponding to the inspection targets (S5). The present method allows for extraction of inspection keywords, strengthening the degree of targeting in inspection, and improving the effect of inspection.

Description

Sampling inspection method, device, computer equipment and storage medium for crowdsourcing tasks

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 02, 2020, the application number is 202010134385.6, and the invention title is "Sampling inspection methods, devices, computer equipment and storage media for crowdsourcing tasks", and its entire contents Incorporated in this application by reference.

Technical field

This application relates to the field of data processing technology, and in particular to sampling inspection methods, devices, computer equipment and storage media for crowdsourced tasks.

Background technique

With the rapid development of network technology, in order to obtain more creative information, or to solve some cross-domain problems efficiently and conveniently, some companies or institutions often issue crowdsourcing tasks to Internet objects through the Internet, and solve these problems through crowdsourcing tasks. problem.

Crowdsourcing tasks refers to the practice of a company or organization outsourcing work tasks performed by employees in the past to non-specific (and usually large) mass networks in a free and voluntary manner. The employees on the crowdsourcing platform are divided into two categories: the person who publishes the task on the platform is called the task publisher, and the person who completes the task is called the respondent. The task publisher publishes the task on the platform, and the respondent gets a certain reward for completing the task. The working method of crowdsourcing tasks can help task publishers obtain a large number of free objects, and solve practical problems by using the wisdom of these objects.

At present, the inventor has found that due to the uncertainty of the respondent’s field of expertise and professionalism, it is necessary to conduct random checks on the correctness of the answers to the collected crowdsourcing tasks. However, when there are a large number of respondents participating in the response, that is, , When there are many answer answers, the inspection will take a long time. The current method is to randomly select a preset number of answer answers from all the answer answers to check, and according to the inspection results, Crowdsourced tasks are evaluated. The inventor found that this random sampling method is less targeted, which makes the evaluation of crowdsourcing tasks unsatisfactory, resulting in low efficiency of crowdsourcing tasks. How to extract crowdsourced tasks in a targeted manner Improving the sampling efficiency of crowdsourcing tasks has become an urgent problem to be solved.

technical problem

The purpose of the embodiments of the present application is to propose a sampling method for crowdsourcing tasks to solve the problem that the random sampling method in the prior art has a weak pertinence, resulting in low efficiency of crowdsourcing task sampling.

Technical solutions

In order to solve the above technical problems, an embodiment of the present application provides a sampling method for crowdsourcing tasks, including:

For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;

Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;

For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database as the corresponding answer for each answer Base frequency

Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.

In order to solve the above technical problems, a technical solution adopted in this application is to provide a sampling device for crowdsourcing tasks, including:

The obtaining module is used to obtain, for each historical crowdsourcing task, the answer of each response object corresponding to the historical crowdsourcing task;

The parsing module is used to analyze and process the response answers to obtain sampling word segmentation, extract sampling keywords of the sampling word segmentation, and store the sampling keywords in a preset answer vocabulary;

The statistics module is used to count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database for the sampled word segmentation in each answer of each response object, as each The basic times corresponding to each answer;

The determination module is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

The selection module is used to select a preset number of response objects as sampling objects in the descending order of the reliability value, and perform a check operation on the response answers corresponding to the sampling objects.

In order to solve the above technical problems, a technical solution adopted in this application is to provide a computer device, including a memory and a processor, the memory stores computer-readable instructions running on the processor, and the processor The steps of the sampling method for crowdsourcing tasks as described below are implemented when the computer-readable instructions are executed:

For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;

In order to solve the above technical problems, a technical solution adopted in this application is: a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions are implemented when executed by a processor The steps of the sampling method for crowdsourcing tasks are as follows:

Beneficial effect

A sampling method for crowdsourcing tasks in the above scheme, by obtaining the response answer of each response object corresponding to the historical crowdsourcing task for each historical crowdsourcing task, and analysing the response answer to obtain the random word segmentation Extract the sampling keywords from the word segmentation, and store the sampling keywords into the preset answer dictionary. The obtained sampling keywords are used for subsequent evaluation of the reliability value of the response object, which makes the evaluation of the reliability value of the response object even better. Pertinence; At the same time, for each response object's sampling word segmentation in each response answer, count the number of times the sampling word segmentation hits the sampling key word, as the basic frequency corresponding to each response answer, and determine each response object according to the basic frequency Corresponding reliability value, and then determine the sampling object according to the reliability value, and check the response answer corresponding to the sampling object. By determining the reliability value corresponding to each response object, and determining the sampling object according to the reliability value, the sampling can be made more targeted, and by arranging the response objects according to the corresponding reliability value, it is beneficial to improve the sampling efficiency.

Description of the drawings

In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic diagram of the application environment of the sampling check method for crowdsourcing tasks provided by an embodiment of the present application;

FIG. 2 is a flow chart of an implementation flow chart of the sampling check method for crowdsourcing tasks according to an embodiment of the present application;

FIG. 3 is an implementation flowchart of step S2 in the sampling method for crowdsourcing tasks provided by an embodiment of the present application;

4 is an implementation flowchart of step S221 in the sampling check method for crowdsourced tasks provided by an embodiment of the present application;

FIG. 5 is an implementation flowchart of step S3 in the sampling method for crowdsourcing tasks provided by an embodiment of the present application;

FIG. 6 is an implementation flowchart of step S4 in the sampling method for crowdsourced tasks provided by the embodiment of the present application;

Figure 7 is a schematic diagram of a sampling device for crowdsourcing tasks provided by an embodiment of the present application;

Fig. 8 is a schematic diagram of a computer device provided by an embodiment of the present application.

The best mode of the present invention

Please refer to Figure 3, which shows a specific implementation of step S2. In step S2, the response answer is parsed to obtain sampled word segmentation, and the sampled keywords are extracted from the sampled word segments, and the sampled keywords are stored The specific implementation process to the preset answer lexicon is described in detail as follows:

S21: Use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation.

Specifically, in the corresponding historical response answers of the respondent, the response answers are often complicated and need to be simplified. By using the dynamic programming algorithm, the response answers are segmented, and the word segmentation related to the random inspection is extracted to obtain the initial word segmentation. .

Among them, the dynamic programming algorithm is usually used to solve problems with certain optimal properties; in this application, the dynamic programming algorithm is used to obtain the optimal initial word segmentation. The basic idea of the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.

Preferably, a vibiter algorithm is used to perform word segmentation processing on the response answer to obtain the initial word segmentation. The Vibit algorithm is a dynamic programming algorithm used to find the observations that are most likely to explain the correlation.

S22: Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain random word segmentation.

Specifically, for the initial word segmentation obtained after the analysis process, the initial word segmentation is filtered, unnecessary or redundant words are processed, and the word segmentation more in line with the sampling needs is obtained, and then the initial word segmentation after the filtering process is synonymous Replace, get random word segmentation. The purpose of synonymous substitution is to convert synonyms and synonyms in word segmentation into a consistent standard vocabulary, further simplify word segmentation, so as to obtain random word segmentation.

Among them, the filtering process refers to filtering out the initial participles of adjectives and adverbs that have little effect on semantic description, and retaining the initial participles that play a key role in semantic expression such as names, verbs, and quantifiers.

Among them, synonymous substitution refers to the conversion of synonyms and similar words into a unified standard vocabulary for representation.

For example, in a specific embodiment, the initial participles obtained are "Zhang San",, Zheng", "Crazy", "打 CALL," and the words "Zheng" and "Crazy" are both adverbs. It can be filtered, "Zhang San" is a specific person's name, and "Call CALL is a specific action. Therefore, after the filtering process, the obtained "Zhang San" and "Call CALL, two participles, and then to " Hit CALL, perform synonymous substitutions, and get random participles "Zhang San" and "Cheers".

S23: For the same historical crowdsourcing task, obtain all the sampling word segmentation corresponding to the historical crowdsourcing task, and count the number of occurrences of each sampling word segmentation to obtain the sampling word frequency corresponding to the sampling word segmentation.

Specifically, for the same historical crowdsourcing task, obtain all the sampled word segmentation corresponding to the historical crowdsourcing task, and count the number of occurrences of each sampled word segmentation to obtain the sampled word frequency corresponding to the sampled word segmentation in the historical crowdsourcing task.

Among them, sampling word frequency is the frequency of occurrence of the sampling word segmentation in the same historical crowdsourcing task. The expression of sampling word frequency can be based on the number of occurrences of the sampling word segmentation or the proportion of the occurrence of the sampling word segmentation. The specifics can be based on the actual situation. Make settings.

For example, in a specific implementation, a historical crowdsourcing task corresponds to 8 random word segmentation, which are: Participle 1, Participle 2, Participle 3, Participle 4, Participle 5, Participle 6, Participle 7, and Participle 8, which are divided into 8 The number of occurrences of the selected word segmentation is 22, 5, 19, 25, 8, 1, 20, 2, and the corresponding frequency of the selected word is 22, 5, 19, 25, 8, 1, 20, 2.

S24: Use the sampled word segmentation whose frequency of the sampled word is greater than the preset word frequency as the sampled keyword, and store the sampled keyword in the preset answer word database.

Specifically, the server stores the preset word frequency and compares each sampled word frequency with the preset word frequency. When the sampled word frequency is greater than the preset word frequency, the sampled word segmentation corresponding to the sampled word frequency is used as the sampled keyword and stored in the preset word frequency. Set the answers in the lexicon.

Among them, the preset word frequency can be set according to actual sampling needs.

In this embodiment, by performing word segmentation processing on the response answer, the initial word segmentation related to sampling can be obtained, and the initial word segmentation can be filtered and synonymous replacement, which can further simplify the word segmentation, so that the random word segmentation can be obtained, and then extracted from the sampling word segmentation Keywords, you can get more accurate randomized keywords.

Embodiments of the present invention

Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the specification and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.

The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.

In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings.

The application will be described in detail below with reference to the drawings and implementations.

Referring to FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications, such as web browser applications, search applications, instant messaging tools, etc., may be installed on the terminal devices 101, 102, and 103.

The terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on.

The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.

It should be noted that the sampling method for crowdsourcing tasks provided by the embodiments of the present application is generally executed by a server. Accordingly, a sampling device for crowdsourcing tasks is generally set in the server.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.

Please refer to FIG. 2, which shows a specific implementation of the crowdsourced task sampling method.

It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 2, and the method includes the following steps:

S1: For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the corresponding answer for each response object.

Specifically, the server stores each historical crowdsourced task, response object, response answer, and the mapping relationship between the response object, response answer, and historical crowdsourced task. For any historical crowdsourced task, there is at least one Answer answer, each answer object corresponds to at most one answer answer for the same historical crowdsourcing task. For each historical crowdsource task, get each answer object's answer answer to the historical crowdsourced task.

Among them, the crowdsourcing task in this embodiment refers to a task in a network manner that allows an object to participate in the task and give a corresponding answer through the network.

Among them, the response object refers to the object that gives the corresponding response answer to the historical crowdsourcing task. Specifically, it can be a plurality of preset network models. For example, for the crowdsourcing task A, a network model K uses a preset method for The crowdsourcing task A performs identification and analysis, and gives a response answer, then the network model K can be called a response object of the crowdsourcing task A.

In this embodiment, an electronic device (such as the server shown in FIG. 1) on which a sampling method of crowdsourcing tasks runs may be connected via a wired connection or a wireless connection. It should be pointed out that the above-mentioned wireless connection methods can include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, as well as other wireless connection methods that are currently known or developed in the future.

S2: Analyze the response answers to obtain random test segmentation words, extract the test keywords from the test segmentation words, and store the random test keywords into the preset answer word database.

Specifically, the response answer is analyzed and processed to obtain the word segmentation information that can express the semantics of the response answer, which is used as the sampling word segmentation, and keywords are extracted from the sampling word segmentation as the sampling keywords, and the sampling keywords are stored in the preset answer For the thesaurus, the specific process can also refer to the description of step S21 to step S24, in order to avoid repetition, it will not be repeated here.

Among them, the analysis processing specifically includes but is not limited to: word segmentation processing, data cleaning, deduplication processing, and synonymous substitution, etc.

Among them, word segmentation processing refers to the process of recombining consecutive word sequences into word sequences according to certain specifications. In this embodiment, it specifically refers to dividing the response answer into individual random test segmentation words, so that these random test segmentation words can be used in the future. Sampling of extraction of keywords.

Among them, the word segmentation processing can be specifically through a third-party word segmentation tool or a word segmentation algorithm.

Among them, common third-party word segmentation tools include, but are not limited to: Stanford NLP word segmentation, ICTCLAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool, etc.

Among them, word segmentation algorithms include but are not limited to: Maximum Forward Matching (MM) algorithm, Reverse Direction Maximum Matching (ReverseDirectionMaximum) Matching Method, RMM) algorithm, Bi-directction Matching method (BM) algorithm, dynamic programming algorithm, Hidden Markov model (Hidden Markov Model, HMM) and N-gram model, etc.

Preferably, this embodiment adopts a dynamic programming algorithm to perform word segmentation processing. For the specific process, refer to the description of step S21. In order to avoid repetition, it will not be repeated here.

Among them, data cleaning refers to the process of discovering and correcting identifiable errors in data files, including checking data consistency, handling invalid and missing values, and so on. In this embodiment, text standardization checks are performed on the response answers and word segmentation, and the interference of invalid items is eliminated, so as to improve the accuracy and efficiency of subsequent extraction of random keywords.

It should be noted that, as a preferred method, in this embodiment, storing the random check keywords into the preset answer word database specifically includes: obtaining historical crowdsourcing tasks corresponding to preset task types; combining the same preset tasks The sampling keywords corresponding to the historical crowdsourcing tasks of the type are used as the same group of sampling keywords; the mapping relationship between the same group of sampling keywords and the preset task types is established, and the mapping relationship is stored in the preset answer word database middle.

Among them, the preset task types can be set according to actual needs and are not specifically limited here. For example, in a specific implementation, the preset task types include: essay questions, multiple-choice questions, fill-in-the-blank questions, and true or false questions, etc. For another example, in another specific implementation manner, the preset task types include: material collection, art manufacturing, planning, publicity design, and so on.

By analysing the answer to the answer, the random word segmentation is obtained, and the random key words are extracted from the random word segmentation. The random key words can be extracted from numerous historical tasks for each response object's response, which can effectively improve the random inspection. The pertinence, and can provide the efficiency of random inspection.

S3: For the random word segmentation in each response answer of each respondent, count the number of times the random word segmentation hits the random key word in the preset answer word database as the basic frequency corresponding to each respondent object.

Specifically, in the preset answers, the number of times the random word segmentation hits the random key word is counted to obtain the basic number of times that can express the reliability of the random word segmentation. The specific process can also refer to the description of step S31 and step S32. To avoid repetition, here No longer.

Among them, the reliability value refers to the evaluation of the reliability of the object's response to the crowdsourcing task based on the object's response to the historical crowdsourcing task. Generally speaking, the higher the reliability value, the risk value of the response answer that the object feedbacks. The smaller.

S4: Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.

Specifically, each response answer corresponds to a response object, and the reliability value corresponding to each response object can be determined through the basic times corresponding to each response answer. The reliability value corresponding to each respondent is obtained, which is used to determine the order of subsequent sampling, avoiding random sampling, and enhancing the pertinence of sampling. For the specific process of S4, please refer to the description of step S41 and step S42. To avoid repetition, it will not be repeated here.

S5: According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.

Specifically, according to the reliability value from small to large, the response objects are sorted to obtain the sequence of response objects; in the order from front to back, a preset number of response objects are selected from the sequence of response objects to be used for sampling Target audience.

Among them, the preset number of response objects selected is set according to the actual sampling needs, and there is no specific limitation here. For example, the number of sampling can be arranged according to the manpower situation of the sampling. If the sampling manpower can only check 100 selected objects, the reliability is The value ranks among the top 100 responders.

Send the random inspection results to the terminal devices 101, 102, 103, so that the respondent can learn the random inspection results.

In this embodiment, for each historical crowdsourcing task, the response answer of each response object corresponding to the historical crowdsourcing task is obtained, and the response answer is parsed to obtain the random word segmentation, and the sampling key words are extracted from the random word segmentation. The random check keywords are stored in the preset answer word database, and the obtained random check keywords are used for subsequent evaluation of the reliability value of the respondent, making the evaluation of the reliability value of the respondent more targeted; at the same time for each For the random word segmentation in each answer of the respondent, count the number of times the random word segmentation hits the random key word, as the basic number of times each response answer corresponds, and determine the reliability value corresponding to each respondent object according to the basic number of times. The reliability value determines the random inspection object, and performs a check operation on the response answer corresponding to the random inspection object. By determining the reliability value corresponding to each respondent object, and determining the sampling object according to the reliability value, the sampling can be made more targeted, and by arranging the responding objects according to the corresponding reliability value, random sampling can be avoided. It is helpful to improve the efficiency of sampling inspection.

In one embodiment, in step S22, the initial word segmentation is filtered, and the filtered initial word segmentation is replaced with synonymous words to obtain random word segmentation, which is described in detail. The specific process is as follows:

By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain random word segmentation.

Specifically, by means of named entity recognition, synonymous replacement is performed on the initial word segmentation after the filtering process to obtain random word segmentation.

Among them, Named Entity Recognition (NER) is a basic method for determining the boundary of an entity, which is mainly related to word segmentation and discovering named entities. It is used to identify entities with specific meanings in text. It is an important part of the practical application of natural language processing. It has an important basic role in application fields such as information extraction, syntactic analysis, and machine translation. Named entity recognition must identify entity boundaries on the one hand, and identify entity categories on the other, such as names of people, places, and organizations.

In this embodiment, by means of named entity recognition, the initial word segmentation that has been filtered is synonymously replaced, and the initial word segmentation is further simplified to obtain random word segmentation, which improves the accuracy of random word segmentation.

Please refer to FIG. 4, which shows that in step S22, the initial word segmentation is synonymously replaced by means of named entity recognition to obtain the specific implementation process of random word segmentation, which is described in detail as follows:

S221: Obtain a preset standard vocabulary dictionary.

Specifically, a standard vocabulary dictionary is set in the server in advance. The standard vocabulary dictionary can effectively filter out relatively redundant vocabulary, such as some unwanted prepositions, adjectives, or Synonyms and synonyms.

S222: For each initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary in a traversal manner, to obtain an entity recognition result.

Specifically, by performing named entity recognition of the initial word segmentation one by one with each word in the standard vocabulary dictionary, different recognition results are obtained. The recognition results may be that the same named entity exists, or it may be different named entities.

For example, the two recognized named entities "such as recognition and "", the two recognized names are different strings on the surface, but they all refer to the city of New York and need to be merged. For example, the recognized The two named entities of "CALL" are different from "cheers", but the corresponding semantics are all meanings of "cheers", and entity naming needs to be merged.

S223: If the entity recognition result is that the same named entity exists, obtain the initial word segmentation and standard vocabulary corresponding to the recognition result, and use the standard word segmentation to replace the initial word segmentation.

Specifically, the entity recognition result is that the same named entity exists, that is, the semantics corresponding to the two participles are the same. By obtaining the initial participle and the standard vocabulary corresponding to the recognition result, and replacing the initial participle with the standard participle, the randomized participle is obtained.

For example, one initial word segmentation named entity "call CALL", and another initial word segmentation named entity "cheer". Through the named entity recognition in step S222, it can be concluded that the entity recognition result is the same named entity, and the initial word segmentation "make CALL" can be obtained. Actual, "cheer" and standard vocabulary "cheer", replace "call and cheer" with "cheer" to get the random participle "cheer".

In this embodiment, by further simplifying the initial word segmentation, a sampling segmentation that is closer to the sampling purpose is obtained, and words such as prepositions and adjectives that have no meaning in the initial segmentation are filtered out, and words with the same or similar word meaning are replaced, which improves the selection of the sampling segmentation. Accuracy.

Please refer to Figure 5. Figure 5 shows a specific implementation of step S3. In step S3, for each response subject's random word segmentation in each answer, count the preset answer word database, the random word segmentation hits The frequency of randomly checking keywords, as the specific realization process of the basic frequency corresponding to each response answer, is described in detail as follows:

S31: Obtain each historical crowdsourcing task participated by the respondent as a reference task.

Specifically, each historical crowdsourcing task that each respondent participates in is stored in the server. By acquiring each historical crowdsourcing task that the responding object participates in, and using it as a reference task, the response answer of each responding object can be included.

S32: For each reference task, obtain sampling keywords from a preset answer vocabulary, and count the number of times the sampling word corresponding to the answer of the response object hits the sampling keywords as the basic frequency.

Specifically, due to the specific requirements of random inspections, it is necessary to randomly inspect the response answers of each respondent to whether the answers to the crowdsourcing task are more correct; by checking the number of times the keywords of different response answers are hit by the random word segmentation Statistics can know the ability of different responders to complete crowdsourcing tasks.

In this embodiment, by acquiring each historical crowdsourcing task that the respondent participates in as a reference task, and determining the basic number of times, it can provide a basis for the subsequent determination of the reliability value.

Please refer to Figure 6. Figure 6 shows a specific implementation of step S4. In step S4, the specific implementation process of determining the reliability value corresponding to each response object is described in detail according to the basic times corresponding to each response answer. as follows:

S41: According to the basic frequency, count the sum M of random key words corresponding to each reference task, and count the sum N of the number of hits of each reference task by the respondent, where M and N are both positive integers, and N is less than or equal to M .

S42: Use the formula δ to calculate by using the formula to obtain the reliability value δ.

For example, the sum M of the sampling keywords corresponding to the reference task is 10, and the sum of the number of hits for each reference task N is 2, and the formula is passed through the formula δ, that is, the reliability value δ is 0.2, and the other responds The sum of the number of hits by the object for each reference task N is 4, that is, the reliability value δ is 0.4; because the higher the reliability value, the lower the risk value of the response answer that the object feedbacks, so the reliability value δ can be obtained as 0.4 The risk value is smaller than the reliability value δ of 0.2.

In this example, by counting the basic times and using formulas to obtain specific reliability values, it can solve the problem of excessive random sampling of response answers in the prior art. Through accurate reliability values, it is possible to accurately know which respondent responds. The answer is trustworthy, and which respondent’s answer is too random.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only Memory, ROM) and other non-volatile storage media, or random storage memory (Random Access Memory, RAM) etc.

Please refer to FIG. 7. As an implementation of the method shown in FIG. 2, this application provides an embodiment of a crowdsourced task sampling device. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.

As shown in FIG. 7, a sampling device for crowdsourcing tasks in this embodiment includes: an acquisition module 51, an analysis module 52, a statistics module 53, a determination module 54, and a selection module 55. in:

The obtaining module 51 is configured to obtain, for each historical crowdsourcing task, a response answer corresponding to the historical crowdsourcing task of each response object.

The parsing module 52 is used to analyze the response answers to obtain random word segmentation, extract sampling keywords of the random word segmentation, and store the sampling keywords into a preset answer vocabulary.

The statistics module 53 is used for counting the number of times that the random word segmentation hits the random key word for the random word segmentation in each response answer of each response object, as the basic frequency corresponding to each response answer.

The determining module 54 is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.

The selection module 55 is used to select a preset number of response objects as sampling objects in the order of the reliability value from small to large, and perform a check operation on the response answers corresponding to the sampling objects.

Further, the analysis module 52 includes:

The word segmentation unit is used to use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation;

The sampling word segmentation determination unit is used to filter the initial word segmentation, and perform synonymous substitutions on the initial word segmentation after the filtering process to obtain the random word segmentation;

The sampling word frequency determination unit is used to obtain all the sampling word segmentation corresponding to the historical crowdsourcing task for the same historical crowdsourcing task, and to count the number of occurrences of each sampling word segmentation to obtain the sampling word frequency corresponding to the sampling word segmentation;

The sampling keyword determination unit is used for sampling the word segmentation whose frequency of the sampling word is greater than the preset word frequency as the sampling keyword, and storing the sampling keyword in the preset answer vocabulary.

Further, the unit for determining word segmentation by sampling includes:

The named entity recognition subunit is used to synonymously replace the initial word segmentation by means of named entity recognition to obtain random word segmentation.

Further, the unit for determining word segmentation by sampling further includes:

Standard vocabulary dictionary acquisition subunit, used to acquire the preset standard vocabulary dictionary;

The entity recognition result determination subunit is used to perform named entity recognition on each initial word segmentation by traversing the initial word segmentation with each vocabulary in the standard vocabulary dictionary to obtain the entity recognition result;

The standard word segmentation substitution subunit is used to obtain the initial word segmentation and standard vocabulary corresponding to the recognition result if the entity recognition result is that the same named entity exists, and use the standard word segmentation to replace the initial word segmentation.

Further, the statistics module 53 includes:

The reference task determination unit is used to obtain each historical crowdsourcing task participated by the respondent as a reference task;

The basic frequency determining unit is used for obtaining random key words from a preset answer word database for each reference task, and counting the number of times the random key words corresponding to the response answers of the response objects hit the random key words as the basic frequency.

Further, the determining module 54 includes:

The basic frequency statistics unit is used to count the sum M of random check keywords corresponding to each reference task according to the basic frequency, and count the sum N of the number of hits of each reference task by the response object, where M and N are both positive integers, And N is less than or equal to M;

The reliability value determining unit is used to calculate the reliability value using the formula δ to obtain the reliability value δ.

The sampling device for crowdsourcing tasks in the above scheme uses the obtaining module 51 for each historical crowdsourcing task to obtain the answer of each response object corresponding to the historical crowdsourcing task; the parsing module 52 parses the answer to obtain Sampling word segmentation, and extracting the sampling keywords from the sampling word segmentation, and storing the sampling keywords into the preset answer word database; extracting the complicated sampling word segmentation into more targeted sampling keywords, which can effectively improve the sampling efficiency. The statistics module 53 counts the number of times the random word hits the random key word for each response object in each response answer of each response object, as the basic number of each response answer; it can convert the response object into the corresponding reliability value, and enhance The pertinence of sampling; the determination module 54 determines the reliability value corresponding to each response object according to the basic number of responses corresponding to each answer, and then the selection module 55 selects a preset number of responses according to the order of the reliability value from small to large The object, as a sampling object, can make the sampling more targeted, which is beneficial to improve the efficiency of the sampling.

In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 8 for details. FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.

The computer device 6 includes a memory 61, a processor 62, and a network interface 63 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows a computer device 6 with three components: a memory 61, a processor 62, and a network interface 63, but it should be understood that it is not required to implement all the illustrated components, and alternative implementations are possible. More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.

Computer equipment can be computing equipment such as desktop computers, notebooks, palmtop computers, and cloud servers. The computer equipment can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.

The memory 61 includes at least one type of readable storage medium. The readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash memory card (Flash Card) and so on. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store an operating system and various application software installed in the computer device 6, such as computer readable instructions for a crowdsourced task sampling method. In addition, the memory 61 may also be used to temporarily store various types of data that have been output or will be output.

The processor 62 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 62 is generally used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to run computer-readable instructions or process data stored in the memory 61, for example, a computer-readable instruction to run a crowdsourced task sampling method.

The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.

This application also provides another implementation manner, that is, a computer-readable storage medium is provided. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores a sampling inspection process, and the sampling inspection process can be executed by at least one processor, so that the at least one processor executes the steps of the aforementioned crowdsourced task sampling method.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method of each embodiment of the present application.

Obviously, the embodiments described above are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in each of the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims

A sampling method for crowdsourcing tasks, including:

For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;

Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;

For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;

Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
The sampling method for crowdsourcing tasks according to claim 1, wherein said performing analysis processing on said response answer to obtain sampling word segmentation comprises:

Use a dynamic programming algorithm to perform word segmentation processing on the response answer to obtain an initial word segmentation;

Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the sampled word segmentation.
The sampling method for crowdsourcing tasks according to claim 1, wherein said extracting sampling keywords from said sampling segmentation and storing said sampling keywords in a preset answer vocabulary comprises:

For the same historical crowdsourcing task, obtain all the random test segmentation words corresponding to the historical crowdsourcing task, and count the number of occurrences of each test segmentation word to obtain the random test word frequency corresponding to the random test segmentation word;

The sampled word segmentation whose frequency of the sampled word is greater than the frequency of the preset word is used as the sampled keyword, and the sampled keyword is stored in the preset answer word database.
The sampling method for crowdsourcing tasks according to claim 2, wherein said synonymously replacing the initial word segmentation after filtering processing to obtain said sampling word segmentation comprises:

By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the sampled word segmentation.
The sampling method for crowdsourcing tasks according to claim 4, wherein said synonymously replacing said initial word segmentation by means of named entity recognition comprises:

Obtain a preset standard vocabulary dictionary;

For each of the initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result;

If the entity recognition result is that the same named entity exists, the initial word segmentation and standard vocabulary corresponding to the recognition result are obtained, and the standard word segmentation is used to replace the initial word segmentation.
The sampling method for crowdsourcing tasks according to any one of claims 1 to 5, wherein the sampling word segmentation in each response answer of each of the response objects is counted in a preset answer vocabulary, The number of times that the sampled word segmentation hits the sampled keywords, as the basic number of times corresponding to each response answer, includes:

Obtain each historical crowdsourcing task that the respondent has participated in as a reference task;

For each of the reference tasks, sampling keywords are obtained from the preset answer vocabulary, and the number of times the sampling word corresponding to the answer of the response object hits the sampling keywords is counted as the basic frequency.
The sampling method for crowdsourcing tasks according to any one of claims 1 to 5, wherein the determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer comprises:

According to the basic times, count the sum M of random check keywords corresponding to each of the reference tasks, and count the sum N of the number of hits by the response object for each of the reference tasks, where M and N are both positive integers , And N is less than or equal to M;

Use the formula δ to calculate with the formula δ to obtain the reliability value δ.
A sampling device for crowdsourcing tasks, including:

The obtaining module is used to obtain, for each historical crowdsourcing task, the answer of each response object corresponding to the historical crowdsourcing task;

The parsing module is used to analyze and process the response answers to obtain sampling word segmentation, extract sampling keywords of the sampling word segmentation, and store the sampling keywords in a preset answer vocabulary;

The statistics module is used to count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database for the sampled word segmentation in each answer of each response object, as each The basic times corresponding to each answer;

The determination module is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

The selection module is used to select a preset number of response objects as sampling objects in the descending order of the reliability value, and perform a check operation on the response answers corresponding to the sampling objects.
A computer device, comprising a memory and a processor, the memory stores computer readable instructions running on the processor, and the processor implements the following crowdsourcing tasks when the processor executes the computer readable instructions The steps of the sampling method:

For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;

Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;

For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;

Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
9. The computer device according to claim 9, wherein said performing analysis processing on said response answer to obtain random word segmentation comprises:

Use a dynamic programming algorithm to perform word segmentation processing on the response answer to obtain an initial word segmentation;

Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the sampled word segmentation.
9. The computer device according to claim 9, wherein said extracting the randomized keywords from the randomized word segmentation and storing the randomized keywords in a preset answer vocabulary comprises:

For the same historical crowdsourcing task, obtain all the spot check segmentation words corresponding to the historical crowdsourcing task, and count the number of occurrences of each spot check word segmentation to obtain the spot check word frequency corresponding to the spot check segmentation word;

The sampled word segmentation whose frequency of the sampled word is greater than the frequency of the preset word is used as the sampled keyword, and the sampled keyword is stored in the preset answer word database.
10. The computer device according to claim 10, wherein said performing synonymous substitution on the filtered initial word segmentation to obtain the sampled word segmentation comprises:

By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the sampled word segmentation.
The computer device according to claim 12, wherein said synonymously replacing said initial word segmentation by means of named entity recognition comprises:

Obtain a preset standard vocabulary dictionary;

For each of the initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result;

If the entity recognition result is that the same named entity exists, the initial word segmentation and standard vocabulary corresponding to the recognition result are obtained, and the standard word segmentation is used to replace the initial word segmentation.
The computer device according to any one of claims 9 to 13, wherein the sampled word segmentation in each answer of each response object is counted in a preset answer vocabulary, the sampled word The number of times the word segmentation hits the selected keywords, as the basic times corresponding to each answer, includes:

Obtain each historical crowdsourcing task that the respondent has participated in as a reference task;

For each of the reference tasks, obtain sampling keywords from the preset answer word database, and count the number of times that the random word segmentation corresponding to the answer of the response object hits the sampling keywords as the basic frequency;

The determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer includes:

According to the basic times, count the sum M of random check keywords corresponding to each of the reference tasks, and count the sum N of the number of hits by the response object for each of the reference tasks, where M and N are both positive integers , And N is less than or equal to M;

Use the formula δ to calculate with the formula δ to obtain the reliability value δ.
A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the steps of the sampling method for crowdsourcing tasks as described below are realized:

For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;

Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;

For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;

Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
15. The computer-readable storage medium according to claim 15, wherein the parsing processing of the response answer to obtain random word segmentation comprises:

Use a dynamic programming algorithm to perform word segmentation processing on the response answer to obtain an initial word segmentation;

Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the sampled word segmentation.
15. The computer-readable storage medium according to claim 15, wherein said extracting sampling keywords from said sampling word segmentation and storing said sampling keywords in a preset answer vocabulary comprises:

For the same historical crowdsourcing task, obtain all the random test segmentation words corresponding to the historical crowdsourcing task, and count the number of occurrences of each test segmentation word to obtain the random test word frequency corresponding to the random test segmentation word;

The sampled word segmentation whose frequency of the sampled word is greater than the frequency of the preset word is used as the sampled keyword, and the sampled keyword is stored in the preset answer word database.
15. The computer-readable storage medium according to claim 16, wherein the synonymous replacement of the initial word segmentation after the filtering process to obtain the sampled word segmentation comprises:

By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the sampled word segmentation.
18. The computer-readable storage medium according to claim 18, wherein said synonymously replacing said initial word segmentation by means of named entity recognition comprises:

Obtain a preset standard vocabulary dictionary;

For each of the initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result;

If the entity recognition result is that the same named entity exists, the initial word segmentation and standard vocabulary corresponding to the recognition result are obtained, and the standard word segmentation is used to replace the initial word segmentation.
The computer-readable storage medium according to any one of claims 15 to 19, wherein the random word segmentation in each response answer of each response object is counted in a preset answer vocabulary, The number of times that the sampled word segmentation hits the sampled keywords, as the basic number of times corresponding to each response answer, includes:

Obtain each historical crowdsourcing task that the respondent has participated in as a reference task;

For each of the reference tasks, obtain sampling keywords from the preset answer word database, and count the number of times that the random word segmentation corresponding to the answer of the response object hits the sampling keywords as the basic frequency;

The determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer includes:

According to the basic times, count the sum M of random check keywords corresponding to each of the reference tasks, and count the sum N of the number of hits by the response object for each of the reference tasks, where M and N are both positive integers , And N is less than or equal to M;

Use the formula δ to calculate with the formula δ to obtain the reliability value δ.