WO2021174829A1 - Crowdsourced task inspection method, apparatus, computer device, and storage medium - Google Patents

Crowdsourced task inspection method, apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021174829A1
WO2021174829A1 PCT/CN2020/118461 CN2020118461W WO2021174829A1 WO 2021174829 A1 WO2021174829 A1 WO 2021174829A1 CN 2020118461 W CN2020118461 W CN 2020118461W WO 2021174829 A1 WO2021174829 A1 WO 2021174829A1
Authority
WO
WIPO (PCT)
Prior art keywords
word segmentation
answer
response
sampling
sampled
Prior art date
Application number
PCT/CN2020/118461
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
李佳琳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174829A1 publication Critical patent/WO2021174829A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Definitions

  • This application relates to the field of data processing technology, and in particular to sampling inspection methods, devices, computer equipment and storage media for crowdsourced tasks.
  • Crowdsourcing tasks refers to the practice of a company or organization outsourcing work tasks performed by employees in the past to non-specific (and usually large) mass networks in a free and voluntary manner.
  • the employees on the crowdsourcing platform are divided into two categories: the person who publishes the task on the platform is called the task publisher, and the person who completes the task is called the respondent.
  • the task publisher publishes the task on the platform, and the respondent gets a certain reward for completing the task.
  • the working method of crowdsourcing tasks can help task publishers obtain a large number of free objects, and solve practical problems by using the wisdom of these objects.
  • the inventor has found that due to the uncertainty of the respondent’s field of expertise and professionalism, it is necessary to conduct random checks on the correctness of the answers to the collected crowdsourcing tasks.
  • the current method is to randomly select a preset number of answer answers from all the answer answers to check, and according to the inspection results, Crowdsourced tasks are evaluated.
  • This random sampling method is less targeted, which makes the evaluation of crowdsourcing tasks unsatisfactory, resulting in low efficiency of crowdsourcing tasks. How to extract crowdsourced tasks in a targeted manner Improving the sampling efficiency of crowdsourcing tasks has become an urgent problem to be solved.
  • the purpose of the embodiments of the present application is to propose a sampling method for crowdsourcing tasks to solve the problem that the random sampling method in the prior art has a weak pertinence, resulting in low efficiency of crowdsourcing task sampling.
  • an embodiment of the present application provides a sampling method for crowdsourcing tasks, including:
  • each response object participating in the historical crowdsourcing task obtains each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
  • a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
  • a technical solution adopted in this application is to provide a sampling device for crowdsourcing tasks, including:
  • the obtaining module is used to obtain, for each historical crowdsourcing task, the answer of each response object corresponding to the historical crowdsourcing task;
  • the parsing module is used to analyze and process the response answers to obtain sampling word segmentation, extract sampling keywords of the sampling word segmentation, and store the sampling keywords in a preset answer vocabulary;
  • the statistics module is used to count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database for the sampled word segmentation in each answer of each response object, as each The basic times corresponding to each answer;
  • the determination module is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
  • the selection module is used to select a preset number of response objects as sampling objects in the descending order of the reliability value, and perform a check operation on the response answers corresponding to the sampling objects.
  • a technical solution adopted in this application is to provide a computer device, including a memory and a processor, the memory stores computer-readable instructions running on the processor, and the processor The steps of the sampling method for crowdsourcing tasks as described below are implemented when the computer-readable instructions are executed:
  • each response object participating in the historical crowdsourcing task obtains each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
  • a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
  • a technical solution adopted in this application is: a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions are implemented when executed by a processor
  • the steps of the sampling method for crowdsourcing tasks are as follows:
  • each response object participating in the historical crowdsourcing task obtains each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
  • a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
  • a sampling method for crowdsourcing tasks in the above scheme by obtaining the response answer of each response object corresponding to the historical crowdsourcing task for each historical crowdsourcing task, and analysing the response answer to obtain the random word segmentation Extract the sampling keywords from the word segmentation, and store the sampling keywords into the preset answer dictionary.
  • the obtained sampling keywords are used for subsequent evaluation of the reliability value of the response object, which makes the evaluation of the reliability value of the response object even better.
  • FIG. 1 is a schematic diagram of the application environment of the sampling check method for crowdsourcing tasks provided by an embodiment of the present application
  • FIG. 2 is a flow chart of an implementation flow chart of the sampling check method for crowdsourcing tasks according to an embodiment of the present application
  • FIG. 3 is an implementation flowchart of step S2 in the sampling method for crowdsourcing tasks provided by an embodiment of the present application
  • step S221 is an implementation flowchart of step S221 in the sampling check method for crowdsourced tasks provided by an embodiment of the present application
  • FIG. 5 is an implementation flowchart of step S3 in the sampling method for crowdsourcing tasks provided by an embodiment of the present application
  • FIG. 6 is an implementation flowchart of step S4 in the sampling method for crowdsourced tasks provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of a sampling device for crowdsourcing tasks provided by an embodiment of the present application.
  • Fig. 8 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • step S2 the response answer is parsed to obtain sampled word segmentation, and the sampled keywords are extracted from the sampled word segments, and the sampled keywords are stored
  • the specific implementation process to the preset answer lexicon is described in detail as follows:
  • S21 Use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation.
  • the response answers are often complicated and need to be simplified.
  • the response answers are segmented, and the word segmentation related to the random inspection is extracted to obtain the initial word segmentation. .
  • the dynamic programming algorithm is usually used to solve problems with certain optimal properties; in this application, the dynamic programming algorithm is used to obtain the optimal initial word segmentation.
  • the basic idea of the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.
  • a vibiter algorithm is used to perform word segmentation processing on the response answer to obtain the initial word segmentation.
  • the Vibit algorithm is a dynamic programming algorithm used to find the observations that are most likely to explain the correlation.
  • S22 Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain random word segmentation.
  • the initial word segmentation obtained after the analysis process is filtered, unnecessary or redundant words are processed, and the word segmentation more in line with the sampling needs is obtained, and then the initial word segmentation after the filtering process is synonymous Replace, get random word segmentation.
  • synonymous substitution is to convert synonyms and synonyms in word segmentation into a consistent standard vocabulary, further simplify word segmentation, so as to obtain random word segmentation.
  • the filtering process refers to filtering out the initial participles of adjectives and adverbs that have little effect on semantic description, and retaining the initial participles that play a key role in semantic expression such as names, verbs, and quantifiers.
  • synonymous substitution refers to the conversion of synonyms and similar words into a unified standard vocabulary for representation.
  • the initial participles obtained are "Zhang San”,, Zheng", “Crazy”, “ ⁇ CALL,” and the words “Zheng” and “Crazy” are both adverbs. It can be filtered, “Zhang San” is a specific person's name, and “Call CALL is a specific action. Therefore, after the filtering process, the obtained “Zhang San” and “Call CALL, two participles, and then to " Hit CALL, perform synonymous substitutions, and get random participles "Zhang San” and "Cheers”.
  • sampling word frequency is the frequency of occurrence of the sampling word segmentation in the same historical crowdsourcing task.
  • the expression of sampling word frequency can be based on the number of occurrences of the sampling word segmentation or the proportion of the occurrence of the sampling word segmentation. The specifics can be based on the actual situation. Make settings.
  • a historical crowdsourcing task corresponds to 8 random word segmentation, which are: Participle 1, Participle 2, Participle 3, Participle 4, Participle 5, Participle 6, Participle 7, and Participle 8, which are divided into 8
  • the number of occurrences of the selected word segmentation is 22, 5, 19, 25, 8, 1, 20, 2, and the corresponding frequency of the selected word is 22, 5, 19, 25, 8, 1, 20, 2.
  • S24 Use the sampled word segmentation whose frequency of the sampled word is greater than the preset word frequency as the sampled keyword, and store the sampled keyword in the preset answer word database.
  • the server stores the preset word frequency and compares each sampled word frequency with the preset word frequency. When the sampled word frequency is greater than the preset word frequency, the sampled word segmentation corresponding to the sampled word frequency is used as the sampled keyword and stored in the preset word frequency. Set the answers in the lexicon.
  • the preset word frequency can be set according to actual sampling needs.
  • the initial word segmentation related to sampling can be obtained, and the initial word segmentation can be filtered and synonymous replacement, which can further simplify the word segmentation, so that the random word segmentation can be obtained, and then extracted from the sampling word segmentation Keywords, you can get more accurate randomized keywords.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, search applications, instant messaging tools, etc., may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • sampling method for crowdsourcing tasks provided by the embodiments of the present application is generally executed by a server. Accordingly, a sampling device for crowdsourcing tasks is generally set in the server.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
  • FIG. 2 shows a specific implementation of the crowdsourced task sampling method.
  • the method of the present application is not limited to the sequence of the process shown in FIG. 2, and the method includes the following steps:
  • the server stores each historical crowdsourced task, response object, response answer, and the mapping relationship between the response object, response answer, and historical crowdsourced task.
  • each answer object corresponds to at most one answer answer for the same historical crowdsourcing task.
  • For each historical crowdsource task get each answer object's answer answer to the historical crowdsourced task.
  • the crowdsourcing task in this embodiment refers to a task in a network manner that allows an object to participate in the task and give a corresponding answer through the network.
  • the response object refers to the object that gives the corresponding response answer to the historical crowdsourcing task.
  • it can be a plurality of preset network models.
  • a network model K uses a preset method for The crowdsourcing task A performs identification and analysis, and gives a response answer, then the network model K can be called a response object of the crowdsourcing task A.
  • an electronic device such as the server shown in FIG. 1 on which a sampling method of crowdsourcing tasks runs may be connected via a wired connection or a wireless connection.
  • wireless connection methods can include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, as well as other wireless connection methods that are currently known or developed in the future.
  • S2 Analyze the response answers to obtain random test segmentation words, extract the test keywords from the test segmentation words, and store the random test keywords into the preset answer word database.
  • the response answer is analyzed and processed to obtain the word segmentation information that can express the semantics of the response answer, which is used as the sampling word segmentation, and keywords are extracted from the sampling word segmentation as the sampling keywords, and the sampling keywords are stored in the preset answer
  • the specific process can also refer to the description of step S21 to step S24, in order to avoid repetition, it will not be repeated here.
  • the analysis processing specifically includes but is not limited to: word segmentation processing, data cleaning, deduplication processing, and synonymous substitution, etc.
  • word segmentation processing refers to the process of recombining consecutive word sequences into word sequences according to certain specifications. In this embodiment, it specifically refers to dividing the response answer into individual random test segmentation words, so that these random test segmentation words can be used in the future. Sampling of extraction of keywords.
  • the word segmentation processing can be specifically through a third-party word segmentation tool or a word segmentation algorithm.
  • common third-party word segmentation tools include, but are not limited to: Stanford NLP word segmentation, ICTCLAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool, etc.
  • word segmentation algorithms include but are not limited to: Maximum Forward Matching (MM) algorithm, Reverse Direction Maximum Matching (ReverseDirectionMaximum) Matching Method, RMM) algorithm, Bi-directction Matching method (BM) algorithm, dynamic programming algorithm, Hidden Markov model (Hidden Markov Model, HMM) and N-gram model, etc.
  • this embodiment adopts a dynamic programming algorithm to perform word segmentation processing.
  • step S21 For the specific process, refer to the description of step S21. In order to avoid repetition, it will not be repeated here.
  • data cleaning refers to the process of discovering and correcting identifiable errors in data files, including checking data consistency, handling invalid and missing values, and so on.
  • text standardization checks are performed on the response answers and word segmentation, and the interference of invalid items is eliminated, so as to improve the accuracy and efficiency of subsequent extraction of random keywords.
  • storing the random check keywords into the preset answer word database specifically includes: obtaining historical crowdsourcing tasks corresponding to preset task types; combining the same preset tasks
  • the sampling keywords corresponding to the historical crowdsourcing tasks of the type are used as the same group of sampling keywords; the mapping relationship between the same group of sampling keywords and the preset task types is established, and the mapping relationship is stored in the preset answer word database middle.
  • the preset task types can be set according to actual needs and are not specifically limited here.
  • the preset task types include: essay questions, multiple-choice questions, fill-in-the-blank questions, and true or false questions, etc.
  • the preset task types include: material collection, art manufacturing, planning, publicity design, and so on.
  • the random word segmentation is obtained, and the random key words are extracted from the random word segmentation.
  • the random key words can be extracted from numerous historical tasks for each response object's response, which can effectively improve the random inspection. The pertinence, and can provide the efficiency of random inspection.
  • the number of times the random word segmentation hits the random key word is counted to obtain the basic number of times that can express the reliability of the random word segmentation.
  • the specific process can also refer to the description of step S31 and step S32. To avoid repetition, here No longer.
  • the reliability value refers to the evaluation of the reliability of the object's response to the crowdsourcing task based on the object's response to the historical crowdsourcing task.
  • the higher the reliability value the risk value of the response answer that the object feedbacks. The smaller.
  • each response answer corresponds to a response object
  • the reliability value corresponding to each response object can be determined through the basic times corresponding to each response answer.
  • the reliability value corresponding to each respondent is obtained, which is used to determine the order of subsequent sampling, avoiding random sampling, and enhancing the pertinence of sampling.
  • step S41 and step S42 please refer to the description of step S41 and step S42. To avoid repetition, it will not be repeated here.
  • the response objects are sorted to obtain the sequence of response objects; in the order from front to back, a preset number of response objects are selected from the sequence of response objects to be used for sampling Target audience.
  • the preset number of response objects selected is set according to the actual sampling needs, and there is no specific limitation here.
  • the number of sampling can be arranged according to the manpower situation of the sampling. If the sampling manpower can only check 100 selected objects, the reliability is The value ranks among the top 100 responders.
  • the response answer of each response object corresponding to the historical crowdsourcing task is obtained, and the response answer is parsed to obtain the random word segmentation, and the sampling key words are extracted from the random word segmentation.
  • the random check keywords are stored in the preset answer word database, and the obtained random check keywords are used for subsequent evaluation of the reliability value of the respondent, making the evaluation of the reliability value of the respondent more targeted; at the same time for each For the random word segmentation in each answer of the respondent, count the number of times the random word segmentation hits the random key word, as the basic number of times each response answer corresponds, and determine the reliability value corresponding to each respondent object according to the basic number of times.
  • the reliability value determines the random inspection object, and performs a check operation on the response answer corresponding to the random inspection object. By determining the reliability value corresponding to each respondent object, and determining the sampling object according to the reliability value, the sampling can be made more targeted, and by arranging the responding objects according to the corresponding reliability value, random sampling can be avoided. It is helpful to improve the efficiency of sampling inspection.
  • step S2 the response answer is parsed to obtain sampled word segmentation, and the sampled keywords are extracted from the sampled word segments, and the sampled keywords are stored
  • the specific implementation process to the preset answer lexicon is described in detail as follows:
  • S21 Use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation.
  • the response answers are often complicated and need to be simplified.
  • the response answers are segmented, and the word segmentation related to the random inspection is extracted to obtain the initial word segmentation. .
  • the dynamic programming algorithm is usually used to solve problems with certain optimal properties; in this application, the dynamic programming algorithm is used to obtain the optimal initial word segmentation.
  • the basic idea of the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.
  • a vibiter algorithm is used to perform word segmentation processing on the response answer to obtain the initial word segmentation.
  • the Vibit algorithm is a dynamic programming algorithm used to find the observations that are most likely to explain the correlation.
  • S22 Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain random word segmentation.
  • the initial word segmentation obtained after the analysis process is filtered, unnecessary or redundant words are processed, and the word segmentation more in line with the sampling needs is obtained, and then the initial word segmentation after the filtering process is synonymous Replace, get random word segmentation.
  • synonymous substitution is to convert synonyms and synonyms in word segmentation into a consistent standard vocabulary, further simplify word segmentation, so as to obtain random word segmentation.
  • the filtering process refers to filtering out the initial participles of adjectives and adverbs that have little effect on semantic description, and retaining the initial participles that play a key role in semantic expression such as names, verbs, and quantifiers.
  • synonymous substitution refers to the conversion of synonyms and similar words into a unified standard vocabulary for representation.
  • the initial participles obtained are "Zhang San”,, Zheng", “Crazy”, “ ⁇ CALL,” and the words “Zheng” and “Crazy” are both adverbs. It can be filtered, “Zhang San” is a specific person's name, and “Call CALL is a specific action. Therefore, after the filtering process, the obtained “Zhang San” and “Call CALL, two participles, and then to " Hit CALL, perform synonymous substitutions, and get random participles "Zhang San” and "Cheers”.
  • sampling word frequency is the frequency of occurrence of the sampling word segmentation in the same historical crowdsourcing task.
  • the expression of sampling word frequency can be based on the number of occurrences of the sampling word segmentation or the proportion of the occurrence of the sampling word segmentation. The specifics can be based on the actual situation. Make settings.
  • a historical crowdsourcing task corresponds to 8 random word segmentation, which are: Participle 1, Participle 2, Participle 3, Participle 4, Participle 5, Participle 6, Participle 7, and Participle 8, which are divided into 8
  • the number of occurrences of the selected word segmentation is 22, 5, 19, 25, 8, 1, 20, 2, and the corresponding frequency of the selected word is 22, 5, 19, 25, 8, 1, 20, 2.
  • S24 Use the sampled word segmentation whose frequency of the sampled word is greater than the preset word frequency as the sampled keyword, and store the sampled keyword in the preset answer word database.
  • the server stores the preset word frequency and compares each sampled word frequency with the preset word frequency. When the sampled word frequency is greater than the preset word frequency, the sampled word segmentation corresponding to the sampled word frequency is used as the sampled keyword and stored in the preset word frequency. Set the answers in the lexicon.
  • the preset word frequency can be set according to actual sampling needs.
  • the initial word segmentation related to sampling can be obtained, and the initial word segmentation can be filtered and synonymous replacement, which can further simplify the word segmentation, so that the random word segmentation can be obtained, and then extracted from the sampling word segmentation Keywords, you can get more accurate randomized keywords.
  • step S22 the initial word segmentation is filtered, and the filtered initial word segmentation is replaced with synonymous words to obtain random word segmentation, which is described in detail.
  • the specific process is as follows:
  • synonymous replacement is performed on the initial word segmentation after the filtering process to obtain random word segmentation.
  • Named Entity Recognition is a basic method for determining the boundary of an entity, which is mainly related to word segmentation and discovering named entities. It is used to identify entities with specific meanings in text. It is an important part of the practical application of natural language processing. It has an important basic role in application fields such as information extraction, syntactic analysis, and machine translation. Named entity recognition must identify entity boundaries on the one hand, and identify entity categories on the other, such as names of people, places, and organizations.
  • the initial word segmentation that has been filtered is synonymously replaced, and the initial word segmentation is further simplified to obtain random word segmentation, which improves the accuracy of random word segmentation.
  • step S22 the initial word segmentation is synonymously replaced by means of named entity recognition to obtain the specific implementation process of random word segmentation, which is described in detail as follows:
  • a standard vocabulary dictionary is set in the server in advance.
  • the standard vocabulary dictionary can effectively filter out relatively redundant vocabulary, such as some unwanted prepositions, adjectives, or Synonyms and synonyms.
  • the recognition results may be that the same named entity exists, or it may be different named entities.
  • the two recognized named entities “such as recognition and "”, the two recognized names are different strings on the surface, but they all refer to the city of New York and need to be merged.
  • the recognized The two named entities of "CALL” are different from “cheers”, but the corresponding semantics are all meanings of "cheers", and entity naming needs to be merged.
  • the entity recognition result is that the same named entity exists, that is, the semantics corresponding to the two participles are the same.
  • one initial word segmentation named entity "call CALL", and another initial word segmentation named entity "cheer”.
  • entity recognition result is the same named entity, and the initial word segmentation "make CALL” can be obtained.
  • "cheer” and standard vocabulary “cheer” replace “call and cheer” with “cheer” to get the random participle "cheer”.
  • Figure 5 shows a specific implementation of step S3.
  • step S3 for each response subject's random word segmentation in each answer, count the preset answer word database, the random word segmentation hits
  • the frequency of randomly checking keywords as the specific realization process of the basic frequency corresponding to each response answer, is described in detail as follows:
  • each historical crowdsourcing task that each respondent participates in is stored in the server.
  • the response answer of each responding object can be included.
  • each historical crowdsourcing task that the respondent participates in as a reference task by acquiring each historical crowdsourcing task that the respondent participates in as a reference task, and determining the basic number of times, it can provide a basis for the subsequent determination of the reliability value.
  • Figure 6 shows a specific implementation of step S4.
  • step S4 the specific implementation process of determining the reliability value corresponding to each response object is described in detail according to the basic times corresponding to each response answer. as follows:
  • S41 According to the basic frequency, count the sum M of random key words corresponding to each reference task, and count the sum N of the number of hits of each reference task by the respondent, where M and N are both positive integers, and N is less than or equal to M .
  • the sum M of the sampling keywords corresponding to the reference task is 10, and the sum of the number of hits for each reference task N is 2, and the formula is passed through the formula ⁇ , that is, the reliability value ⁇ is 0.2, and the other responds
  • the sum of the number of hits by the object for each reference task N is 4, that is, the reliability value ⁇ is 0.4; because the higher the reliability value, the lower the risk value of the response answer that the object feedbacks, so the reliability value ⁇ can be obtained as 0.4
  • the risk value is smaller than the reliability value ⁇ of 0.2.
  • the computer program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only Memory, ROM) and other non-volatile storage media, or random storage memory (Random Access Memory, RAM) etc.
  • this application provides an embodiment of a crowdsourced task sampling device.
  • the device embodiment corresponds to the method embodiment shown in FIG.
  • the device can be applied to various electronic devices.
  • a sampling device for crowdsourcing tasks in this embodiment includes: an acquisition module 51, an analysis module 52, a statistics module 53, a determination module 54, and a selection module 55. in:
  • the obtaining module 51 is configured to obtain, for each historical crowdsourcing task, a response answer corresponding to the historical crowdsourcing task of each response object.
  • the parsing module 52 is used to analyze the response answers to obtain random word segmentation, extract sampling keywords of the random word segmentation, and store the sampling keywords into a preset answer vocabulary.
  • the statistics module 53 is used for counting the number of times that the random word segmentation hits the random key word for the random word segmentation in each response answer of each response object, as the basic frequency corresponding to each response answer.
  • the determining module 54 is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.
  • the selection module 55 is used to select a preset number of response objects as sampling objects in the order of the reliability value from small to large, and perform a check operation on the response answers corresponding to the sampling objects.
  • analysis module 52 includes:
  • the word segmentation unit is used to use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation;
  • the sampling word segmentation determination unit is used to filter the initial word segmentation, and perform synonymous substitutions on the initial word segmentation after the filtering process to obtain the random word segmentation;
  • the sampling word frequency determination unit is used to obtain all the sampling word segmentation corresponding to the historical crowdsourcing task for the same historical crowdsourcing task, and to count the number of occurrences of each sampling word segmentation to obtain the sampling word frequency corresponding to the sampling word segmentation;
  • the sampling keyword determination unit is used for sampling the word segmentation whose frequency of the sampling word is greater than the preset word frequency as the sampling keyword, and storing the sampling keyword in the preset answer vocabulary.
  • the unit for determining word segmentation by sampling includes:
  • the named entity recognition subunit is used to synonymously replace the initial word segmentation by means of named entity recognition to obtain random word segmentation.
  • the unit for determining word segmentation by sampling further includes:
  • Standard vocabulary dictionary acquisition subunit used to acquire the preset standard vocabulary dictionary
  • the entity recognition result determination subunit is used to perform named entity recognition on each initial word segmentation by traversing the initial word segmentation with each vocabulary in the standard vocabulary dictionary to obtain the entity recognition result;
  • the standard word segmentation substitution subunit is used to obtain the initial word segmentation and standard vocabulary corresponding to the recognition result if the entity recognition result is that the same named entity exists, and use the standard word segmentation to replace the initial word segmentation.
  • the statistics module 53 includes:
  • the reference task determination unit is used to obtain each historical crowdsourcing task participated by the respondent as a reference task
  • the basic frequency determining unit is used for obtaining random key words from a preset answer word database for each reference task, and counting the number of times the random key words corresponding to the response answers of the response objects hit the random key words as the basic frequency.
  • the determining module 54 includes:
  • the basic frequency statistics unit is used to count the sum M of random check keywords corresponding to each reference task according to the basic frequency, and count the sum N of the number of hits of each reference task by the response object, where M and N are both positive integers, And N is less than or equal to M;
  • the reliability value determining unit is used to calculate the reliability value using the formula ⁇ to obtain the reliability value ⁇ .
  • the sampling device for crowdsourcing tasks in the above scheme uses the obtaining module 51 for each historical crowdsourcing task to obtain the answer of each response object corresponding to the historical crowdsourcing task; the parsing module 52 parses the answer to obtain Sampling word segmentation, and extracting the sampling keywords from the sampling word segmentation, and storing the sampling keywords into the preset answer word database; extracting the complicated sampling word segmentation into more targeted sampling keywords, which can effectively improve the sampling efficiency.
  • the statistics module 53 counts the number of times the random word hits the random key word for each response object in each response answer of each response object, as the basic number of each response answer; it can convert the response object into the corresponding reliability value, and enhance The pertinence of sampling; the determination module 54 determines the reliability value corresponding to each response object according to the basic number of responses corresponding to each answer, and then the selection module 55 selects a preset number of responses according to the order of the reliability value from small to large
  • the object, as a sampling object can make the sampling more targeted, which is beneficial to improve the efficiency of the sampling.
  • FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 6 includes a memory 61, a processor 62, and a network interface 63 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows a computer device 6 with three components: a memory 61, a processor 62, and a network interface 63, but it should be understood that it is not required to implement all the illustrated components, and alternative implementations are possible. More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • Computer equipment can be computing equipment such as desktop computers, notebooks, palmtop computers, and cloud servers.
  • the computer equipment can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 61 includes at least one type of readable storage medium.
  • the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6.
  • the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash memory card (Flash Card) and so on.
  • the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device.
  • the memory 61 is generally used to store an operating system and various application software installed in the computer device 6, such as computer readable instructions for a crowdsourced task sampling method.
  • the memory 61 may also be used to temporarily store various types of data that have been output or will be output.
  • the processor 62 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 62 is generally used to control the overall operation of the computer device 6.
  • the processor 62 is configured to run computer-readable instructions or process data stored in the memory 61, for example, a computer-readable instruction to run a crowdsourced task sampling method.
  • the network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a sampling inspection process, and the sampling inspection process can be executed by at least one processor, so that the at least one processor executes the steps of the aforementioned crowdsourced task sampling method.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method of each embodiment of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A crowdsourced task inspection method, an apparatus, a computer device, and a storage medium, applied in the field of big data databases. The method comprises: with respect to each historically crowdsourced task, obtaining each responder that participated in the historically crowdsourced tasks as well as each response answer corresponding to each responder (S1); performing parsing on the response answers, obtaining inspection tokens, extracting inspection keywords from the inspection tokens, and storing the inspection keywords in a preset answer word base (S2); with respect to the inspection tokens from each response answer of each responder, tabulating frequencies of inspection token inspection keyword hits in the preset answer word base, which serve as base frequencies corresponding to each response answer (S3), and determining a reliability value corresponding to each responder according to the base frequencies corresponding to each response answer (S4); determining inspection targets according to the reliability values, and performing evaluation operations on response answers corresponding to the inspection targets (S5). The present method allows for extraction of inspection keywords, strengthening the degree of targeting in inspection, and improving the effect of inspection.

Description

众包任务的抽检方法、装置、计算机设备及存储介质Sampling inspection method, device, computer equipment and storage medium for crowdsourcing tasks
本申请要求于2020年3月02日提交中国专利局、申请号为202010134385.6,发明名称为“众包任务的抽检方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 02, 2020, the application number is 202010134385.6, and the invention title is "Sampling inspection methods, devices, computer equipment and storage media for crowdsourcing tasks", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及众包任务的抽检方法、装置、计算机设备及存储介质。This application relates to the field of data processing technology, and in particular to sampling inspection methods, devices, computer equipment and storage media for crowdsourced tasks.
背景技术Background technique
随着网络技术的飞速发展,一些公司或者机构为了获取更多创意信息,或者高效便捷解决一些跨领域问题,往往会通过互联网向互联网对象发放众包任务,通过众包任务的方式,来解决这些问题。With the rapid development of network technology, in order to obtain more creative information, or to solve some cross-domain problems efficiently and conveniently, some companies or institutions often issue crowdsourcing tasks to Internet objects through the Internet, and solve these problems through crowdsourcing tasks. problem.
众包任务是指一个公司或者机构把过去由员工执行的工作任务,以自由自愿的形式外包给非特定的(而且通常是大型的)大众网络的做法。众包平台上的员工分为两类:在平台上发布任务的人员称为任务发布者,完成任务的人员称为应答对象。任务发布者在平台上发布任务,应答对象通过完成任务获得一定的报酬。众包任务的工作方式可以帮助任务发布者获得大量自由的对象,通过利用这些对象的智慧解决实际问题。Crowdsourcing tasks refers to the practice of a company or organization outsourcing work tasks performed by employees in the past to non-specific (and usually large) mass networks in a free and voluntary manner. The employees on the crowdsourcing platform are divided into two categories: the person who publishes the task on the platform is called the task publisher, and the person who completes the task is called the respondent. The task publisher publishes the task on the platform, and the respondent gets a certain reward for completing the task. The working method of crowdsourcing tasks can help task publishers obtain a large number of free objects, and solve practical problems by using the wisdom of these objects.
在当前,发明人发现,由于应答对象的擅长领域和专业程度的不确定性,需要对收集到的众包任务的应答答案的正确性进行抽检,但在参与应答的对象数量较多时,也即,获取到的应答答案较多时,检查需要耗费较长时间,当前的做法是通过随机抽检的方式,从所有应答答案中,随机抽取预设数量的应答答案进行检查,并根据检查结果,对该众包任务进行评估,发明人发现这种随机抽检的方式针对性较弱,使得众包任务的效果评估并不理想,导致众包任务抽检效率低的问题,如何有针对性地抽取众包任务,提高众包任务的抽检效率,成了一个亟待解决的难题。At present, the inventor has found that due to the uncertainty of the respondent’s field of expertise and professionalism, it is necessary to conduct random checks on the correctness of the answers to the collected crowdsourcing tasks. However, when there are a large number of respondents participating in the response, that is, , When there are many answer answers, the inspection will take a long time. The current method is to randomly select a preset number of answer answers from all the answer answers to check, and according to the inspection results, Crowdsourced tasks are evaluated. The inventor found that this random sampling method is less targeted, which makes the evaluation of crowdsourcing tasks unsatisfactory, resulting in low efficiency of crowdsourcing tasks. How to extract crowdsourced tasks in a targeted manner Improving the sampling efficiency of crowdsourcing tasks has become an urgent problem to be solved.
技术问题technical problem
本申请实施例的目的在于提出一种众包任务的抽检方法,解决现有技术随机抽检方式的针对性较弱,导致众包任务抽检效率低的问题。The purpose of the embodiments of the present application is to propose a sampling method for crowdsourcing tasks to solve the problem that the random sampling method in the prior art has a weak pertinence, resulting in low efficiency of crowdsourcing task sampling.
技术解决方案Technical solutions
为了解决上述技术问题,本申请实施例提供一种众包任务的抽检方法,包括:In order to solve the above technical problems, an embodiment of the present application provides a sampling method for crowdsourcing tasks, including:
针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;
针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database as the corresponding answer for each answer Base frequency
根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
为解决上述技术问题,本申请采用的一个技术方案是:提供一种众包任务的抽检装置,包括:In order to solve the above technical problems, a technical solution adopted in this application is to provide a sampling device for crowdsourcing tasks, including:
获取模块,用于针对每个历史众包任务,获取每个应答对象对应所述历史众包任务的应答答案;The obtaining module is used to obtain, for each historical crowdsourcing task, the answer of each response object corresponding to the historical crowdsourcing task;
解析模块,用于对所述应答答案进行解析处理,得到抽检分词,并提取所述抽检分词的抽检关键字,将所述抽检关键字存入到预设的答案词库;The parsing module is used to analyze and process the response answers to obtain sampling word segmentation, extract sampling keywords of the sampling word segmentation, and store the sampling keywords in a preset answer vocabulary;
统计模块,用于针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;The statistics module is used to count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database for the sampled word segmentation in each answer of each response object, as each The basic times corresponding to each answer;
确定模块,用于根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;The determination module is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
选取模块,用于按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。The selection module is used to select a preset number of response objects as sampling objects in the descending order of the reliability value, and perform a check operation on the response answers corresponding to the sampling objects.
为解决上述技术问题,本申请采用的一个技术方案是:提供一种计算机设备,包括存储器和处理器,所述存储器中存储有在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的众包任务的抽检方法的步骤:In order to solve the above technical problems, a technical solution adopted in this application is to provide a computer device, including a memory and a processor, the memory stores computer-readable instructions running on the processor, and the processor The steps of the sampling method for crowdsourcing tasks as described below are implemented when the computer-readable instructions are executed:
针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;
针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;
根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
为解决上述技术问题,本申请采用的一个技术方案是:一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述众包任务的抽检方法的步骤:In order to solve the above technical problems, a technical solution adopted in this application is: a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions are implemented when executed by a processor The steps of the sampling method for crowdsourcing tasks are as follows:
针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;
针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;
根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
有益效果Beneficial effect
以上方案中的一种众包任务的抽检方法,通过针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库,得到的抽检关键字用于后续对应答对象的可靠性值进行评估,使得对应答对象的可靠性值的评估更加具有针对性;同时针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数,并根据基础次数,确定每个应答对象对应的可靠性值,然后按照可靠性值确定抽检对象,并对抽检对象对应的应答答案进行检查操作。通过确定每个应答对象对应的可靠性值,并根据可靠性值确定抽检对象,能够使得抽检更加有针对性,并且通过将应答对象按照对应的可靠性值进行排列,有利于提高抽检效率。A sampling method for crowdsourcing tasks in the above scheme, by obtaining the response answer of each response object corresponding to the historical crowdsourcing task for each historical crowdsourcing task, and analysing the response answer to obtain the random word segmentation Extract the sampling keywords from the word segmentation, and store the sampling keywords into the preset answer dictionary. The obtained sampling keywords are used for subsequent evaluation of the reliability value of the response object, which makes the evaluation of the reliability value of the response object even better. Pertinence; At the same time, for each response object's sampling word segmentation in each response answer, count the number of times the sampling word segmentation hits the sampling key word, as the basic frequency corresponding to each response answer, and determine each response object according to the basic frequency Corresponding reliability value, and then determine the sampling object according to the reliability value, and check the response answer corresponding to the sampling object. By determining the reliability value corresponding to each response object, and determining the sampling object according to the reliability value, the sampling can be made more targeted, and by arranging the response objects according to the corresponding reliability value, it is beneficial to improve the sampling efficiency.
附图说明Description of the drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1是本申请实施例提供的众包任务的抽检方法的应用环境示意图;FIG. 1 is a schematic diagram of the application environment of the sampling check method for crowdsourcing tasks provided by an embodiment of the present application;
图2 根据本申请实施例提供的众包任务的抽检方法的一实现流程图;FIG. 2 is a flow chart of an implementation flow chart of the sampling check method for crowdsourcing tasks according to an embodiment of the present application;
图3是本申请实施例提供的众包任务的抽检方法中步骤S2的一实现流程图;FIG. 3 is an implementation flowchart of step S2 in the sampling method for crowdsourcing tasks provided by an embodiment of the present application;
图4是本申请实施例提供的众包任务的抽检方法中步骤S221的一实现流程图;4 is an implementation flowchart of step S221 in the sampling check method for crowdsourced tasks provided by an embodiment of the present application;
图5是本申请实施例提供的众包任务的抽检方法中步骤S3的一实现流程图;FIG. 5 is an implementation flowchart of step S3 in the sampling method for crowdsourcing tasks provided by an embodiment of the present application;
图6是本申请实施例提供的众包任务的抽检方法中步骤S4的一实现流程图;FIG. 6 is an implementation flowchart of step S4 in the sampling method for crowdsourced tasks provided by the embodiment of the present application;
图7是本申请实施例提供的众包任务的抽检装置示意图;Figure 7 is a schematic diagram of a sampling device for crowdsourcing tasks provided by an embodiment of the present application;
图8是本申请实施例提供的计算机设备的示意图。Fig. 8 is a schematic diagram of a computer device provided by an embodiment of the present application.
本发明的最佳实施方式The best mode of the present invention
请参阅图3,图3示出了步骤S2的一种具体实施方式,步骤S2中,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库的具体实现过程,详叙如下:Please refer to Figure 3, which shows a specific implementation of step S2. In step S2, the response answer is parsed to obtain sampled word segmentation, and the sampled keywords are extracted from the sampled word segments, and the sampled keywords are stored The specific implementation process to the preset answer lexicon is described in detail as follows:
S21:使用动态规划算法,对应答答案进行分词处理,得到初始分词。S21: Use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation.
具体的,在应答对象的对应的历史应答答案中,应答答案往往较为冗杂,需要对其进行简化处理,通过使用动态规划算法,对应答答案进行分词处理,提取与抽检相关的分词,得到初始分词。Specifically, in the corresponding historical response answers of the respondent, the response answers are often complicated and need to be simplified. By using the dynamic programming algorithm, the response answers are segmented, and the word segmentation related to the random inspection is extracted to obtain the initial word segmentation. .
其中,动态规划算法通常用于求解具有某种最优性质的问题;在本申请中,使用动态规划算法得出最优的初始分词。动态规划算法其基本思想是将待求解问题分解成若干个子问题,先求解子问题,然后从这些子问题的解得到原问题的解。Among them, the dynamic programming algorithm is usually used to solve problems with certain optimal properties; in this application, the dynamic programming algorithm is used to obtain the optimal initial word segmentation. The basic idea of the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.
优选的,使用维比特(vibiter)算法,对应答答案进行分词处理,得到初始分词。维比特算法是用于寻找观察结果最有可能解释相关的动态规划算法。Preferably, a vibiter algorithm is used to perform word segmentation processing on the response answer to obtain the initial word segmentation. The Vibit algorithm is a dynamic programming algorithm used to find the observations that are most likely to explain the correlation.
S22:对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词。S22: Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain random word segmentation.
具体的,针对解析处理后 ,得到的初始分词,对初始分词进行过滤处理,处理掉不必要或是冗余的词汇,得到更加符合抽检需要的分词,再对过滤处理后的初始分词进行同义替换,得到抽检分词。同义替换的目的在于将分词中同义词、近义词转化成一致的标准词汇,进一步简化分词,从而能够得到抽检分词。Specifically, for the initial word segmentation obtained after the analysis process, the initial word segmentation is filtered, unnecessary or redundant words are processed, and the word segmentation more in line with the sampling needs is obtained, and then the initial word segmentation after the filtering process is synonymous Replace, get random word segmentation. The purpose of synonymous substitution is to convert synonyms and synonyms in word segmentation into a consistent standard vocabulary, further simplify word segmentation, so as to obtain random word segmentation.
其中,过滤处理是指对于过滤掉形容词、副词等对语义描述影响不大的词性的初始分词,保留名称、动词和量词等对语义表达起到关键作用的初始分词。Among them, the filtering process refers to filtering out the initial participles of adjectives and adverbs that have little effect on semantic description, and retaining the initial participles that play a key role in semantic expression such as names, verbs, and quantifiers.
其中,同义替换是指将同义词、近义词转换为统一的标准词汇进行表示。Among them, synonymous substitution refers to the conversion of synonyms and similar words into a unified standard vocabulary for representation.
例如,在一具体实施方式中,得到的初始分词为“张三”、 、正在”、“疯狂地”、“打CALL、,而“正在”和“疯狂地”这两个词均为副词,可予过滤,“张三”为一具体人名、“打CALL人指示的为一具体动作,因而,在过滤处理之后,得到的“张三”、 和“打CALL、两个分词,进而对“打CALL,进行同义替换,得到抽检分词“张三”、 和“欢呼”。For example, in a specific embodiment, the initial participles obtained are "Zhang San",, Zheng", "Crazy", "打 CALL," and the words "Zheng" and "Crazy" are both adverbs. It can be filtered, "Zhang San" is a specific person's name, and "Call CALL is a specific action. Therefore, after the filtering process, the obtained "Zhang San" and "Call CALL, two participles, and then to " Hit CALL, perform synonymous substitutions, and get random participles "Zhang San" and "Cheers".
S23:针对同一历史众包任务,获取历史众包任务对应的所有抽检分词,并统计每个抽检分词的出现次数,得到抽检分词对应的抽检词频。S23: For the same historical crowdsourcing task, obtain all the sampling word segmentation corresponding to the historical crowdsourcing task, and count the number of occurrences of each sampling word segmentation to obtain the sampling word frequency corresponding to the sampling word segmentation.
具体地,对于同一个历史众包任务,获取该历史众包任务对应的所有抽检分词,并每种抽检分词的出现次数进行统计,得到该历史众包任务中,该抽检分词对应的抽检词频。Specifically, for the same historical crowdsourcing task, obtain all the sampled word segmentation corresponding to the historical crowdsourcing task, and count the number of occurrences of each sampled word segmentation to obtain the sampled word frequency corresponding to the sampled word segmentation in the historical crowdsourcing task.
其中,抽检词频是在同一历史众包任务中,该抽检分词出现的频率,抽检词频的表示方式,可以是按抽检分词出现的次数,也可以是统计抽检分词出现的比例,具体可依据实际情况进行设置。Among them, sampling word frequency is the frequency of occurrence of the sampling word segmentation in the same historical crowdsourcing task. The expression of sampling word frequency can be based on the number of occurrences of the sampling word segmentation or the proportion of the occurrence of the sampling word segmentation. The specifics can be based on the actual situation. Make settings.
例如,在一具体实施方式中,一历史众包任务对应有8个抽检分词,分别为:分词1、分词2、分词3、分词4、分词5、分词6、分词7和分词8,折8个抽检分词的出现次数依次为:22、5、19、25、8、1、20、2,得到对应的抽检词频依次为22、5、19、25、8、1、20、2。For example, in a specific implementation, a historical crowdsourcing task corresponds to 8 random word segmentation, which are: Participle 1, Participle 2, Participle 3, Participle 4, Participle 5, Participle 6, Participle 7, and Participle 8, which are divided into 8 The number of occurrences of the selected word segmentation is 22, 5, 19, 25, 8, 1, 20, 2, and the corresponding frequency of the selected word is 22, 5, 19, 25, 8, 1, 20, 2.
S24:将抽检词频大于预设词频的抽检分词,作为抽检关键字,并将抽检关键字存入到预设的答案词库。S24: Use the sampled word segmentation whose frequency of the sampled word is greater than the preset word frequency as the sampled keyword, and store the sampled keyword in the preset answer word database.
具体地,服务端存储由预设词频,将每个抽检词频与预设词频进行比较,当抽检词频大于预设词频时,将抽检词频对应的抽检分词,作为抽检关键字,并存入到预设的答案词库中。Specifically, the server stores the preset word frequency and compares each sampled word frequency with the preset word frequency. When the sampled word frequency is greater than the preset word frequency, the sampled word segmentation corresponding to the sampled word frequency is used as the sampled keyword and stored in the preset word frequency. Set the answers in the lexicon.
其中,预设词频可根据实际抽检需要进行设置。Among them, the preset word frequency can be set according to actual sampling needs.
本实施例中,通过对应答答案进行分词处理,能够得到与抽检相关的初始分词,并对初始分词进行过滤处理和同义替换,能够进一步简化分词,从而能够得到抽检分词,然后从抽检分词提取关键字,能够获取到更为精确的抽检关键字。In this embodiment, by performing word segmentation processing on the response answer, the initial word segmentation related to sampling can be obtained, and the initial word segmentation can be filtered and synonymous replacement, which can further simplify the word segmentation, so that the random word segmentation can be obtained, and then extracted from the sampling word segmentation Keywords, you can get more accurate randomized keywords.
本发明的实施方式Embodiments of the present invention
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field of the application; the terms used in the specification of the application herein are only for describing specific embodiments. The purpose is not to limit the application; the terms "including" and "having" in the specification and claims of the application and the above-mentioned description of the drawings and any variations thereof are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings.
下面结合附图和实施方式对本申请进行详细说明。The application will be described in detail below with reference to the drawings and implementations.
请参阅图1,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。Referring to FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104 and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、搜索类应用、即时通信工具等。The user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications, such as web browser applications, search applications, instant messaging tools, etc., may be installed on the terminal devices 101, 102, and 103.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
需要说明的是,本申请实施例所提供的一种众包任务的抽检方法一般由服务器执行,相应地,一种众包任务的抽检装置一般设置于服务器中。It should be noted that the sampling method for crowdsourcing tasks provided by the embodiments of the present application is generally executed by a server. Accordingly, a sampling device for crowdsourcing tasks is generally set in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
请参阅图2,图2示出了众包任务抽检方法的一种具体实施方式。Please refer to FIG. 2, which shows a specific implementation of the crowdsourced task sampling method.
需注意的是,若有实质上相同的结果,本申请的方法并不以图2所示的流程顺序为限,该方法包括如下步骤:It should be noted that if there are substantially the same results, the method of the present application is not limited to the sequence of the process shown in FIG. 2, and the method includes the following steps:
S1:针对每个历史众包任务,获取参与历史众包任务的每个应答对象,以及每个应答对象对应的应答答案。S1: For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the corresponding answer for each response object.
具体地,服务端存储有每个历史众包任务、应答对象、应答答案,以及,应答对象、应答答案和历史众包任务之间的映射关系,对于任意一个历史众包任务,均存在至少一个应答答案,每个应答对象对于同一历史众包任务,最多对应有一个应答答案,针对每个历史众包任务,获取每个应答对象对于历史众包任务的应答答案。Specifically, the server stores each historical crowdsourced task, response object, response answer, and the mapping relationship between the response object, response answer, and historical crowdsourced task. For any historical crowdsourced task, there is at least one Answer answer, each answer object corresponds to at most one answer answer for the same historical crowdsourcing task. For each historical crowdsource task, get each answer object's answer answer to the historical crowdsourced task.
其中,本实施例中的众包任务是指通过网络,让对象参与任务并给出相应的应答答案的网络方式的任务。Among them, the crowdsourcing task in this embodiment refers to a task in a network manner that allows an object to participate in the task and give a corresponding answer through the network.
其中,应答对象是指针对历史众包任务给出相应的应答答案的对象,具体可以是多个预设的网络模型,例如,针对众包任务A,一网络模型K通过预设方式,对该众包任务A进行识别解析,给出了应答答案,则给网络模型K可以称为该众包任务A的一个应答对象。Among them, the response object refers to the object that gives the corresponding response answer to the historical crowdsourcing task. Specifically, it can be a plurality of preset network models. For example, for the crowdsourcing task A, a network model K uses a preset method for The crowdsourcing task A performs identification and analysis, and gives a response answer, then the network model K can be called a response object of the crowdsourcing task A.
在本实施例中,一种众包任务的抽检方法运行于其上的电子设备(例如图1所示的服务器),可以通过有线连接方式或者无线连接方式。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB( ultra wideband )连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, an electronic device (such as the server shown in FIG. 1) on which a sampling method of crowdsourcing tasks runs may be connected via a wired connection or a wireless connection. It should be pointed out that the above-mentioned wireless connection methods can include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, as well as other wireless connection methods that are currently known or developed in the future.
S2:对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库。S2: Analyze the response answers to obtain random test segmentation words, extract the test keywords from the test segmentation words, and store the random test keywords into the preset answer word database.
具体的,对应答答案进行解析处理,得到能表达应答答案语义的分词信息,作为抽检分词,并从抽检分词中提取关键字,作为抽检关键字,并将抽检关键字存入到预设的答案词库,具体过程也可参考步骤S21至步骤S24的描述,为避免重复,此处不再赘述。Specifically, the response answer is analyzed and processed to obtain the word segmentation information that can express the semantics of the response answer, which is used as the sampling word segmentation, and keywords are extracted from the sampling word segmentation as the sampling keywords, and the sampling keywords are stored in the preset answer For the thesaurus, the specific process can also refer to the description of step S21 to step S24, in order to avoid repetition, it will not be repeated here.
其中,解析处理具体包括但不限于:分词处理、数据清洗、去重处理和同义替换等。Among them, the analysis processing specifically includes but is not limited to: word segmentation processing, data cleaning, deduplication processing, and synonymous substitution, etc.
其中,分词处理是指将连续的字序列按照一定的规范重新组合成词序列的过程,在本实施例中,具体是指将应答答案分成一个个独立的抽检分词,以便后续使用这些抽检分词进行抽检关键字的提取。Among them, word segmentation processing refers to the process of recombining consecutive word sequences into word sequences according to certain specifications. In this embodiment, it specifically refers to dividing the response answer into individual random test segmentation words, so that these random test segmentation words can be used in the future. Sampling of extraction of keywords.
其中,分词处理具体可以通过第三方分词工具,或者分词算法。Among them, the word segmentation processing can be specifically through a third-party word segmentation tool or a word segmentation algorithm.
其中,常见的第三方分词工具包括但不限于:Stanford NLP分词器、ICTClAS分词系统、ansj分词工具和HanLP中文分词工具等。Among them, common third-party word segmentation tools include, but are not limited to: Stanford NLP word segmentation, ICTCLAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool, etc.
其中,分词算法包括但不限于:最大正向匹配(Maximum Matching,MM)算法、逆向最大匹配(ReverseDirectionMaximum Matching Method,RMM)算法、双向最大匹配(Bi-directction Matching method,BM)算法、动态规划算法、隐马尔科夫模型(Hidden Markov Model,HMM)和N-gram模型等。Among them, word segmentation algorithms include but are not limited to: Maximum Forward Matching (MM) algorithm, Reverse Direction Maximum Matching (ReverseDirectionMaximum) Matching Method, RMM) algorithm, Bi-directction Matching method (BM) algorithm, dynamic programming algorithm, Hidden Markov model (Hidden Markov Model, HMM) and N-gram model, etc.
优选地,本实施例采用动态规划算法进行分词处理,具体过程可参考步骤S21的描述,为避免重复,此处不再赘述。Preferably, this embodiment adopts a dynamic programming algorithm to perform word segmentation processing. For the specific process, refer to the description of step S21. In order to avoid repetition, it will not be repeated here.
其中,数据清洗是指发现并纠正数据文件中可识别的错误的一道程序,包括检查数据一致性,处理无效值和缺失值等。在本实施例中,是对应答答案和分词进行文本规范性检查,剔除掉无效项的干扰,以便提高后续提取抽检关键字的准确率和效率。Among them, data cleaning refers to the process of discovering and correcting identifiable errors in data files, including checking data consistency, handling invalid and missing values, and so on. In this embodiment, text standardization checks are performed on the response answers and word segmentation, and the interference of invalid items is eliminated, so as to improve the accuracy and efficiency of subsequent extraction of random keywords.
需要说明的是,作为一种优选方式,本实施例中,将抽检关键字存入到预设的答案词库,具体包括:获取历史众包任务对应有预设任务类型;将相同预设任务类型的历史众包任务对应的抽检关键字,作为同组抽检关键字;建立同组抽检关键字与预设任务类型之间的映射关系,并将该映射关系存入到预设的答案词库中。It should be noted that, as a preferred method, in this embodiment, storing the random check keywords into the preset answer word database specifically includes: obtaining historical crowdsourcing tasks corresponding to preset task types; combining the same preset tasks The sampling keywords corresponding to the historical crowdsourcing tasks of the type are used as the same group of sampling keywords; the mapping relationship between the same group of sampling keywords and the preset task types is established, and the mapping relationship is stored in the preset answer word database middle.
其中,预设的任务类型可以根据实际需要进行设定,此处不作具体限定,例如,在一具体实施方式中,预设的任务类型包括:问答题、选择题、填空题和判断题等,又例如,在另一具体实施方式中,预设的任务类型包括:素材收集、美工制造、策划、宣传设计等。Among them, the preset task types can be set according to actual needs and are not specifically limited here. For example, in a specific implementation, the preset task types include: essay questions, multiple-choice questions, fill-in-the-blank questions, and true or false questions, etc. For another example, in another specific implementation manner, the preset task types include: material collection, art manufacturing, planning, publicity design, and so on.
通过对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,能够从众多的历史任务中,针对每个应答对象的应答答案提取到抽检关键字,从而能够有效的提高抽检的针对性,并能够提供抽检的效率。By analysing the answer to the answer, the random word segmentation is obtained, and the random key words are extracted from the random word segmentation. The random key words can be extracted from numerous historical tasks for each response object's response, which can effectively improve the random inspection. The pertinence, and can provide the efficiency of random inspection.
S3:针对每个应答对象的每个应答答案中的抽检分词,统计预设的答案词库中,抽检分词命中抽检关键字的次数,作为每个应答对象对应的基础次数。S3: For the random word segmentation in each response answer of each respondent, count the number of times the random word segmentation hits the random key word in the preset answer word database as the basic frequency corresponding to each respondent object.
具体的,统计预设的答案中,抽检分词命中抽检关键字的次数,得到能表达抽检分词的可靠性的基础次数,具体过程也可参考步骤S31和步骤S32的描述,为避免重复,此处不再赘述。Specifically, in the preset answers, the number of times the random word segmentation hits the random key word is counted to obtain the basic number of times that can express the reliability of the random word segmentation. The specific process can also refer to the description of step S31 and step S32. To avoid repetition, here No longer.
其中,可靠性值是指根据对象对于历史众包任务的应答答案,来评估对象对于众包任务的应答情况的可靠程度,一般来说,可靠性值越高,对象反馈的应答答案的风险值越小。Among them, the reliability value refers to the evaluation of the reliability of the object's response to the crowdsourcing task based on the object's response to the historical crowdsourcing task. Generally speaking, the higher the reliability value, the risk value of the response answer that the object feedbacks. The smaller.
S4:根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值。S4: Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.
具体的,每个应答答案对应有应答对象,通过每个应答答案对应的基础次数,可以确定每个应答对象对应的可靠性值。得到每个应答对象对应的可靠性值,用于确定后续抽检的顺序,避免了随机性的抽检,增强了抽检的针对性。其S4具体过程也可参考步骤S41和步骤S42的描述,为避免重复,此处不再赘述。Specifically, each response answer corresponds to a response object, and the reliability value corresponding to each response object can be determined through the basic times corresponding to each response answer. The reliability value corresponding to each respondent is obtained, which is used to determine the order of subsequent sampling, avoiding random sampling, and enhancing the pertinence of sampling. For the specific process of S4, please refer to the description of step S41 and step S42. To avoid repetition, it will not be repeated here.
S5:按照可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。S5: According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
具体的,按照可靠性值进行从小到大的顺序,对应答对象进行排序,得到应答对象序列;按照从前往后的顺序,从应答对象序列中,选取预设数量的应答对象,作为用于抽检的目标对象。Specifically, according to the reliability value from small to large, the response objects are sorted to obtain the sequence of response objects; in the order from front to back, a preset number of response objects are selected from the sequence of response objects to be used for sampling Target audience.
其中,选取预设数量的应答对象根据实际的抽检需要进行设置,此处不作具体限定,例如,可以根据抽检的人力情况安排抽检的数量,抽检人力若只能检查100个选取对象,则可靠性值排在前100应答对象。Among them, the preset number of response objects selected is set according to the actual sampling needs, and there is no specific limitation here. For example, the number of sampling can be arranged according to the manpower situation of the sampling. If the sampling manpower can only check 100 selected objects, the reliability is The value ranks among the top 100 responders.
向终端设备101、102、103发送抽检结果,使得应答对象能够获知抽检结果。Send the random inspection results to the terminal devices 101, 102, 103, so that the respondent can learn the random inspection results.
本实施例中,通过针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库,得到的抽检关键字用于后续对应答对象的可靠性值进行评估,使得对应答对象的可靠性值的评估更加具有针对性;同时针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数,并根据基础次数,确定每个应答对象对应的可靠性值,然后按照可靠性值确定抽检对象,并对抽检对象对应的应答答案进行检查操作。通过确定每个应答对象对应的可靠性值,并根据可靠性值确定抽检对象,能够使得抽检更加有针对性,并且通过将应答对象按照对应的可靠性值进行排列,避免了随机抽检的情况,有利于提高抽检效率。In this embodiment, for each historical crowdsourcing task, the response answer of each response object corresponding to the historical crowdsourcing task is obtained, and the response answer is parsed to obtain the random word segmentation, and the sampling key words are extracted from the random word segmentation. The random check keywords are stored in the preset answer word database, and the obtained random check keywords are used for subsequent evaluation of the reliability value of the respondent, making the evaluation of the reliability value of the respondent more targeted; at the same time for each For the random word segmentation in each answer of the respondent, count the number of times the random word segmentation hits the random key word, as the basic number of times each response answer corresponds, and determine the reliability value corresponding to each respondent object according to the basic number of times. The reliability value determines the random inspection object, and performs a check operation on the response answer corresponding to the random inspection object. By determining the reliability value corresponding to each respondent object, and determining the sampling object according to the reliability value, the sampling can be made more targeted, and by arranging the responding objects according to the corresponding reliability value, random sampling can be avoided. It is helpful to improve the efficiency of sampling inspection.
请参阅图3,图3示出了步骤S2的一种具体实施方式,步骤S2中,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库的具体实现过程,详叙如下:Please refer to Figure 3, which shows a specific implementation of step S2. In step S2, the response answer is parsed to obtain sampled word segmentation, and the sampled keywords are extracted from the sampled word segments, and the sampled keywords are stored The specific implementation process to the preset answer lexicon is described in detail as follows:
S21:使用动态规划算法,对应答答案进行分词处理,得到初始分词。S21: Use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation.
具体的,在应答对象的对应的历史应答答案中,应答答案往往较为冗杂,需要对其进行简化处理,通过使用动态规划算法,对应答答案进行分词处理,提取与抽检相关的分词,得到初始分词。Specifically, in the corresponding historical response answers of the respondent, the response answers are often complicated and need to be simplified. By using the dynamic programming algorithm, the response answers are segmented, and the word segmentation related to the random inspection is extracted to obtain the initial word segmentation. .
其中,动态规划算法通常用于求解具有某种最优性质的问题;在本申请中,使用动态规划算法得出最优的初始分词。动态规划算法其基本思想是将待求解问题分解成若干个子问题,先求解子问题,然后从这些子问题的解得到原问题的解。Among them, the dynamic programming algorithm is usually used to solve problems with certain optimal properties; in this application, the dynamic programming algorithm is used to obtain the optimal initial word segmentation. The basic idea of the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.
优选的,使用维比特(vibiter)算法,对应答答案进行分词处理,得到初始分词。维比特算法是用于寻找观察结果最有可能解释相关的动态规划算法。Preferably, a vibiter algorithm is used to perform word segmentation processing on the response answer to obtain the initial word segmentation. The Vibit algorithm is a dynamic programming algorithm used to find the observations that are most likely to explain the correlation.
S22:对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词。S22: Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain random word segmentation.
具体的,针对解析处理后 ,得到的初始分词,对初始分词进行过滤处理,处理掉不必要或是冗余的词汇,得到更加符合抽检需要的分词,再对过滤处理后的初始分词进行同义替换,得到抽检分词。同义替换的目的在于将分词中同义词、近义词转化成一致的标准词汇,进一步简化分词,从而能够得到抽检分词。Specifically, for the initial word segmentation obtained after the analysis process, the initial word segmentation is filtered, unnecessary or redundant words are processed, and the word segmentation more in line with the sampling needs is obtained, and then the initial word segmentation after the filtering process is synonymous Replace, get random word segmentation. The purpose of synonymous substitution is to convert synonyms and synonyms in word segmentation into a consistent standard vocabulary, further simplify word segmentation, so as to obtain random word segmentation.
其中,过滤处理是指对于过滤掉形容词、副词等对语义描述影响不大的词性的初始分词,保留名称、动词和量词等对语义表达起到关键作用的初始分词。Among them, the filtering process refers to filtering out the initial participles of adjectives and adverbs that have little effect on semantic description, and retaining the initial participles that play a key role in semantic expression such as names, verbs, and quantifiers.
其中,同义替换是指将同义词、近义词转换为统一的标准词汇进行表示。Among them, synonymous substitution refers to the conversion of synonyms and similar words into a unified standard vocabulary for representation.
例如,在一具体实施方式中,得到的初始分词为“张三”、 、正在”、“疯狂地”、“打CALL、,而“正在”和“疯狂地”这两个词均为副词,可予过滤,“张三”为一具体人名、“打CALL人指示的为一具体动作,因而,在过滤处理之后,得到的“张三”、 和“打CALL、两个分词,进而对“打CALL,进行同义替换,得到抽检分词“张三”、 和“欢呼”。For example, in a specific embodiment, the initial participles obtained are "Zhang San",, Zheng", "Crazy", "打 CALL," and the words "Zheng" and "Crazy" are both adverbs. It can be filtered, "Zhang San" is a specific person's name, and "Call CALL is a specific action. Therefore, after the filtering process, the obtained "Zhang San" and "Call CALL, two participles, and then to " Hit CALL, perform synonymous substitutions, and get random participles "Zhang San" and "Cheers".
S23:针对同一历史众包任务,获取历史众包任务对应的所有抽检分词,并统计每个抽检分词的出现次数,得到抽检分词对应的抽检词频。S23: For the same historical crowdsourcing task, obtain all the sampling word segmentation corresponding to the historical crowdsourcing task, and count the number of occurrences of each sampling word segmentation to obtain the sampling word frequency corresponding to the sampling word segmentation.
具体地,对于同一个历史众包任务,获取该历史众包任务对应的所有抽检分词,并每种抽检分词的出现次数进行统计,得到该历史众包任务中,该抽检分词对应的抽检词频。Specifically, for the same historical crowdsourcing task, obtain all the sampled word segmentation corresponding to the historical crowdsourcing task, and count the number of occurrences of each sampled word segmentation to obtain the sampled word frequency corresponding to the sampled word segmentation in the historical crowdsourcing task.
其中,抽检词频是在同一历史众包任务中,该抽检分词出现的频率,抽检词频的表示方式,可以是按抽检分词出现的次数,也可以是统计抽检分词出现的比例,具体可依据实际情况进行设置。Among them, sampling word frequency is the frequency of occurrence of the sampling word segmentation in the same historical crowdsourcing task. The expression of sampling word frequency can be based on the number of occurrences of the sampling word segmentation or the proportion of the occurrence of the sampling word segmentation. The specifics can be based on the actual situation. Make settings.
例如,在一具体实施方式中,一历史众包任务对应有8个抽检分词,分别为:分词1、分词2、分词3、分词4、分词5、分词6、分词7和分词8,折8个抽检分词的出现次数依次为:22、5、19、25、8、1、20、2,得到对应的抽检词频依次为22、5、19、25、8、1、20、2。For example, in a specific implementation, a historical crowdsourcing task corresponds to 8 random word segmentation, which are: Participle 1, Participle 2, Participle 3, Participle 4, Participle 5, Participle 6, Participle 7, and Participle 8, which are divided into 8 The number of occurrences of the selected word segmentation is 22, 5, 19, 25, 8, 1, 20, 2, and the corresponding frequency of the selected word is 22, 5, 19, 25, 8, 1, 20, 2.
S24:将抽检词频大于预设词频的抽检分词,作为抽检关键字,并将抽检关键字存入到预设的答案词库。S24: Use the sampled word segmentation whose frequency of the sampled word is greater than the preset word frequency as the sampled keyword, and store the sampled keyword in the preset answer word database.
具体地,服务端存储由预设词频,将每个抽检词频与预设词频进行比较,当抽检词频大于预设词频时,将抽检词频对应的抽检分词,作为抽检关键字,并存入到预设的答案词库中。Specifically, the server stores the preset word frequency and compares each sampled word frequency with the preset word frequency. When the sampled word frequency is greater than the preset word frequency, the sampled word segmentation corresponding to the sampled word frequency is used as the sampled keyword and stored in the preset word frequency. Set the answers in the lexicon.
其中,预设词频可根据实际抽检需要进行设置。Among them, the preset word frequency can be set according to actual sampling needs.
本实施例中,通过对应答答案进行分词处理,能够得到与抽检相关的初始分词,并对初始分词进行过滤处理和同义替换,能够进一步简化分词,从而能够得到抽检分词,然后从抽检分词提取关键字,能够获取到更为精确的抽检关键字。In this embodiment, by performing word segmentation processing on the response answer, the initial word segmentation related to sampling can be obtained, and the initial word segmentation can be filtered and synonymous replacement, which can further simplify the word segmentation, so that the random word segmentation can be obtained, and then extracted from the sampling word segmentation Keywords, you can get more accurate randomized keywords.
在一实施例中,步骤S22中对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词,进行了详细说明,具体过程如下:In one embodiment, in step S22, the initial word segmentation is filtered, and the filtered initial word segmentation is replaced with synonymous words to obtain random word segmentation, which is described in detail. The specific process is as follows:
通过命名实体识别的方式,对初始分词进行同义替换,得到抽检分词。By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain random word segmentation.
具体地,通过命名实体识别的方式,对过滤处理后的初始分词进行同义替换,得到抽检分词。Specifically, by means of named entity recognition, synonymous replacement is performed on the initial word segmentation after the filtering process to obtain random word segmentation.
其中,命名实体识别(Named Entity Recognition,NER)是确定实体边界主要和分词相关,发现命名实体的基本方法,用于识别文本中具有特定意义的实体,它是自然语言处理实用化的重要内容,在信息提取、句法分析、机器翻译等应用领域中具有重要的基础性作用。命名实体识别一方面要识别实体边界,另一方面要识别实体类别,例如人名、地名、机构名等。Among them, Named Entity Recognition (NER) is a basic method for determining the boundary of an entity, which is mainly related to word segmentation and discovering named entities. It is used to identify entities with specific meanings in text. It is an important part of the practical application of natural language processing. It has an important basic role in application fields such as information extraction, syntactic analysis, and machine translation. Named entity recognition must identify entity boundaries on the one hand, and identify entity categories on the other, such as names of people, places, and organizations.
在本实施例中,通过命名实体识别的方式,对已经经过过滤后的初始分词进行同义替换,对初始分词的进一步简化,得到抽检分词,提高了抽检分词的精准度。In this embodiment, by means of named entity recognition, the initial word segmentation that has been filtered is synonymously replaced, and the initial word segmentation is further simplified to obtain random word segmentation, which improves the accuracy of random word segmentation.
请参阅图4,图4示出了步骤S22中,通过命名实体识别的方式,对初始分词进行同义替换,得到抽检分词的具体实现过程,详叙如下:Please refer to FIG. 4, which shows that in step S22, the initial word segmentation is synonymously replaced by means of named entity recognition to obtain the specific implementation process of random word segmentation, which is described in detail as follows:
S221:获取预设的标准词汇字典。S221: Obtain a preset standard vocabulary dictionary.
具体的,服务器中设置事先设置有标准词汇字典,其标准词汇字典根据抽检的应答答案的词汇设定,能够有效的过滤掉相对冗余的词汇,例如一些不需要的介词、形容词,亦或是同义词和近义词。Specifically, a standard vocabulary dictionary is set in the server in advance. The standard vocabulary dictionary can effectively filter out relatively redundant vocabulary, such as some unwanted prepositions, adjectives, or Synonyms and synonyms.
S222:针对每个初始分词,通过遍历的方式,将初始分词分别与标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果。S222: For each initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary in a traversal manner, to obtain an entity recognition result.
具体的,通过将初始分词一一与标准词汇字典中的每个词汇进行命名实体识别,得到不同的识别结果,该识别结果可能是存在相同的命名实体,也有可能是不相同的命名实体。Specifically, by performing named entity recognition of the initial word segmentation one by one with each word in the standard vocabulary dictionary, different recognition results are obtained. The recognition results may be that the same named entity exists, or it may be different named entities.
例如,识别到的两个命名实体“如,识别和““,识别到的两个命,表面上是不同的字符串,但其实指的都是纽约这个城市,需要合并,又例如,识别到的两个命名实体“打CALL是和“欢呼”,字面描述不同,但对应的语义都是“欢呼”的含义,需要进行实体命名合并。For example, the two recognized named entities "such as recognition and "", the two recognized names are different strings on the surface, but they all refer to the city of New York and need to be merged. For example, the recognized The two named entities of "CALL" are different from "cheers", but the corresponding semantics are all meanings of "cheers", and entity naming needs to be merged.
S223:若实体识别结果为存在相同命名实体,则获取识别结果对应的初始分词和标准词汇,并使用标准分词替代初始分词。S223: If the entity recognition result is that the same named entity exists, obtain the initial word segmentation and standard vocabulary corresponding to the recognition result, and use the standard word segmentation to replace the initial word segmentation.
具体地,实体识别结果为存在相同命名实体,即两个分词对应的语义都是相同的含义,通过获取识别结果对应的初始分词和标准词汇,并使用标准分词替代初始分词,得到抽检分词。Specifically, the entity recognition result is that the same named entity exists, that is, the semantics corresponding to the two participles are the same. By obtaining the initial participle and the standard vocabulary corresponding to the recognition result, and replacing the initial participle with the standard participle, the randomized participle is obtained.
例如,一个初始分词命名实体“打CALL个,另一个初始分词命名实体“欢呼”,通过步骤S222中的命名实体识别,可以的得出实体识别结果为相同命名实体,通过获取初始分词“打CALL实、“欢呼”和标准词汇“喝彩”,通过“喝彩”替换“打CALL打、“欢呼”,得到抽检分词“喝彩”。For example, one initial word segmentation named entity "call CALL", and another initial word segmentation named entity "cheer". Through the named entity recognition in step S222, it can be concluded that the entity recognition result is the same named entity, and the initial word segmentation "make CALL" can be obtained. Actual, "cheer" and standard vocabulary "cheer", replace "call and cheer" with "cheer" to get the random participle "cheer".
本实施例中,通过进一步简化初始分词,得到更加贴近抽检目的的抽检分词,过滤掉了初始分词中没有意义的介词、形容词等词语,并替换相同或相近词义的词语,提高了抽检分词选取的精准度。In this embodiment, by further simplifying the initial word segmentation, a sampling segmentation that is closer to the sampling purpose is obtained, and words such as prepositions and adjectives that have no meaning in the initial segmentation are filtered out, and words with the same or similar word meaning are replaced, which improves the selection of the sampling segmentation. Accuracy.
请参阅图5,图5示出了步骤S3的一种具体实施方式,步骤S3中,针对每个应答对象的每个应答答案中的抽检分词,统计预设的答案词库中,抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数的具体实现过程,详叙如下:Please refer to Figure 5. Figure 5 shows a specific implementation of step S3. In step S3, for each response subject's random word segmentation in each answer, count the preset answer word database, the random word segmentation hits The frequency of randomly checking keywords, as the specific realization process of the basic frequency corresponding to each response answer, is described in detail as follows:
S31:获取应答对象参与的每个历史众包任务,作为参考任务。S31: Obtain each historical crowdsourcing task participated by the respondent as a reference task.
具体的,在服务器中保存着每个应答对象参与的每个历史众包任务,通过获取应答对象参与的每个历史众包任务,将其作为参考任务,能够囊括每个应答对象的应答答案。Specifically, each historical crowdsourcing task that each respondent participates in is stored in the server. By acquiring each historical crowdsourcing task that the responding object participates in, and using it as a reference task, the response answer of each responding object can be included.
S32:针对每个参考任务,从预设的答案词库中获取抽检关键字,并统计应答对象的应答答案对应的抽检分词命中抽检关键字的次数,作为基础次数。S32: For each reference task, obtain sampling keywords from a preset answer vocabulary, and count the number of times the sampling word corresponding to the answer of the response object hits the sampling keywords as the basic frequency.
具体的,由于抽检针对性的要求,需要针对性的抽检每个应答对象的应答答案对众包任务的应答答案,是否更加具有正确性;通过对不同应答答案的抽检分词命中关键字的次数进行统计,能够获知不同应答对象对众包任务完成能力。Specifically, due to the specific requirements of random inspections, it is necessary to randomly inspect the response answers of each respondent to whether the answers to the crowdsourcing task are more correct; by checking the number of times the keywords of different response answers are hit by the random word segmentation Statistics can know the ability of different responders to complete crowdsourcing tasks.
本实施例中,通过获取应答对象参与的每个历史众包任务,作为参考任务,并确定基础次数,能够为后续可靠性值的确定提供基础。In this embodiment, by acquiring each historical crowdsourcing task that the respondent participates in as a reference task, and determining the basic number of times, it can provide a basis for the subsequent determination of the reliability value.
请参阅图6,图6示出了步骤S4的一种具体实施方式,步骤S4中,根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值的具体实现过程,详叙如下:Please refer to Figure 6. Figure 6 shows a specific implementation of step S4. In step S4, the specific implementation process of determining the reliability value corresponding to each response object is described in detail according to the basic times corresponding to each response answer. as follows:
S41:根据基础次数,统计每个参考任务对应的抽检关键字之和M,并统计应答对象对于每个参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M。S41: According to the basic frequency, count the sum M of random key words corresponding to each reference task, and count the sum N of the number of hits of each reference task by the respondent, where M and N are both positive integers, and N is less than or equal to M .
S42:采用公式δ采用公式进行计算,得到可靠性值δ。S42: Use the formula δ to calculate by using the formula to obtain the reliability value δ.
例如,其中参考任务对应的抽检关键字之和M为10,其中一个应答对象对于每个参考任务命中次数之和N为2,通过公式δ通过公式,即可靠性值δ为0.2,另外一个应答对象对于每个参考任务命中次数之和N为4,即可靠性值δ为0.4;由于可靠性值越高,对象反馈的应答答案的风险值越小,所以可以得出可靠性值δ为0.4比靠性值δ为0.2风险值小。For example, the sum M of the sampling keywords corresponding to the reference task is 10, and the sum of the number of hits for each reference task N is 2, and the formula is passed through the formula δ, that is, the reliability value δ is 0.2, and the other responds The sum of the number of hits by the object for each reference task N is 4, that is, the reliability value δ is 0.4; because the higher the reliability value, the lower the risk value of the response answer that the object feedbacks, so the reliability value δ can be obtained as 0.4 The risk value is smaller than the reliability value δ of 0.2.
本实例中,通过统计基础次数,并采用公式得出具体的可靠性值,能够解决现有技术中过于对应答答案的随机抽检,通过精确的可靠性值,能够准确的获知哪些应答对象的应答答案值得信任,哪些应答对象的应答答案过于随机。In this example, by counting the basic times and using formulas to obtain specific reliability values, it can solve the problem of excessive random sampling of response answers in the prior art. Through accurate reliability values, it is possible to accurately know which respondent responds. The answer is trustworthy, and which respondent’s answer is too random.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only Memory, ROM) and other non-volatile storage media, or random storage memory (Random Access Memory, RAM) etc.
请参考图7,作为对上述图2所示方法的实现,本申请提供了一种众包任务的抽检装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Please refer to FIG. 7. As an implementation of the method shown in FIG. 2, this application provides an embodiment of a crowdsourced task sampling device. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.
如图7所示,本实施例的一种众包任务的抽检装置包括:获取模块51、解析模块52、统计模块53、确定模块54以及选取模块55。其中:As shown in FIG. 7, a sampling device for crowdsourcing tasks in this embodiment includes: an acquisition module 51, an analysis module 52, a statistics module 53, a determination module 54, and a selection module 55. in:
获取模块51,用于针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案。The obtaining module 51 is configured to obtain, for each historical crowdsourcing task, a response answer corresponding to the historical crowdsourcing task of each response object.
解析模块52,用于对应答答案进行解析处理,得到抽检分词,并提取抽检分词的抽检关键字,将抽检关键字存入到预设的答案词库。The parsing module 52 is used to analyze the response answers to obtain random word segmentation, extract sampling keywords of the random word segmentation, and store the sampling keywords into a preset answer vocabulary.
统计模块53,用于针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数。The statistics module 53 is used for counting the number of times that the random word segmentation hits the random key word for the random word segmentation in each response answer of each response object, as the basic frequency corresponding to each response answer.
确定模块54,用于根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值。The determining module 54 is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.
选取模块55,用于用于按照可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。The selection module 55 is used to select a preset number of response objects as sampling objects in the order of the reliability value from small to large, and perform a check operation on the response answers corresponding to the sampling objects.
进一步地,解析模块52包括:Further, the analysis module 52 includes:
分词单元,用于使用动态规划算法,对应答答案进行分词处理,得到初始分词;The word segmentation unit is used to use the dynamic programming algorithm to perform word segmentation processing on the response answer to obtain the initial word segmentation;
抽检分词确定单元,用于对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词;The sampling word segmentation determination unit is used to filter the initial word segmentation, and perform synonymous substitutions on the initial word segmentation after the filtering process to obtain the random word segmentation;
抽检词频确定单元,用于针对同一历史众包任务,获取历史众包任务对应的所有抽检分词,并统计每个抽检分词的出现次数,得到抽检分词对应的抽检词频;The sampling word frequency determination unit is used to obtain all the sampling word segmentation corresponding to the historical crowdsourcing task for the same historical crowdsourcing task, and to count the number of occurrences of each sampling word segmentation to obtain the sampling word frequency corresponding to the sampling word segmentation;
抽检关键字确定单元,用于将抽检词频大于预设词频的抽检分词,作为抽检关键字,并将抽检关键字存入到预设的答案词库。The sampling keyword determination unit is used for sampling the word segmentation whose frequency of the sampling word is greater than the preset word frequency as the sampling keyword, and storing the sampling keyword in the preset answer vocabulary.
进一步地,抽检分词确定单元包括:Further, the unit for determining word segmentation by sampling includes:
命名实体识别子单元,用于通过命名实体识别的方式,对初始分词进行同义替换,得到抽检分词。The named entity recognition subunit is used to synonymously replace the initial word segmentation by means of named entity recognition to obtain random word segmentation.
进一步地,抽检分词确定单元还包括:Further, the unit for determining word segmentation by sampling further includes:
标准词汇字典获取子单元,用于获取预设的标准词汇字典;Standard vocabulary dictionary acquisition subunit, used to acquire the preset standard vocabulary dictionary;
实体识别结果确定子单元,用于针对每个初始分词,通过遍历的方式,将初始分词分别与标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;The entity recognition result determination subunit is used to perform named entity recognition on each initial word segmentation by traversing the initial word segmentation with each vocabulary in the standard vocabulary dictionary to obtain the entity recognition result;
标准分词替代子单元,用于若实体识别结果为存在相同命名实体,则获取识别结果对应的初始分词和标准词汇,并使用标准分词替代初始分词。The standard word segmentation substitution subunit is used to obtain the initial word segmentation and standard vocabulary corresponding to the recognition result if the entity recognition result is that the same named entity exists, and use the standard word segmentation to replace the initial word segmentation.
进一步地,统计模块53包括:Further, the statistics module 53 includes:
参考任务确定单元,用于获取应答对象参与的每个历史众包任务,作为参考任务;The reference task determination unit is used to obtain each historical crowdsourcing task participated by the respondent as a reference task;
基础次数确定单元,用于针对每个参考任务,从预设的答案词库中获取抽检关键字,并统计应答对象的应答答案对应的抽检分词命中抽检关键字的次数,作为基础次数。The basic frequency determining unit is used for obtaining random key words from a preset answer word database for each reference task, and counting the number of times the random key words corresponding to the response answers of the response objects hit the random key words as the basic frequency.
进一步的,确定模块54包括:Further, the determining module 54 includes:
基础次数统计单元,用于根据基础次数,统计每个参考任务对应的抽检关键字之和M,并统计应答对象对于每个参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;The basic frequency statistics unit is used to count the sum M of random check keywords corresponding to each reference task according to the basic frequency, and count the sum N of the number of hits of each reference task by the response object, where M and N are both positive integers, And N is less than or equal to M;
可靠性值确定单元,用于采用公式δ靠性值确进行计算,得到可靠性值δ。The reliability value determining unit is used to calculate the reliability value using the formula δ to obtain the reliability value δ.
以上方案中的一种众包任务的抽检装置,通过获取模块51针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案;解析模块52对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库;将繁杂的抽检分词提炼出针对性较强的抽检关键字,能够有效的提高抽检效率。统计模块53针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案的基础次数;能够将应答对象转化成对应的可靠性值,增强抽检的针对性;确定模块54根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值,然后选取模块55按照可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,能够使得抽检更加有针对性,有利于提高抽检效率。The sampling device for crowdsourcing tasks in the above scheme uses the obtaining module 51 for each historical crowdsourcing task to obtain the answer of each response object corresponding to the historical crowdsourcing task; the parsing module 52 parses the answer to obtain Sampling word segmentation, and extracting the sampling keywords from the sampling word segmentation, and storing the sampling keywords into the preset answer word database; extracting the complicated sampling word segmentation into more targeted sampling keywords, which can effectively improve the sampling efficiency. The statistics module 53 counts the number of times the random word hits the random key word for each response object in each response answer of each response object, as the basic number of each response answer; it can convert the response object into the corresponding reliability value, and enhance The pertinence of sampling; the determination module 54 determines the reliability value corresponding to each response object according to the basic number of responses corresponding to each answer, and then the selection module 55 selects a preset number of responses according to the order of the reliability value from small to large The object, as a sampling object, can make the sampling more targeted, which is beneficial to improve the efficiency of the sampling.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 8 for details. FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.
计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是,图中仅示出了具有三种组件存储器61、处理器62、网络接口63的计算机设备6,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器 (Digital Signal Processor,DSP)、嵌入式设备等。The computer device 6 includes a memory 61, a processor 62, and a network interface 63 that are mutually communicatively connected via a system bus. It should be pointed out that the figure only shows a computer device 6 with three components: a memory 61, a processor 62, and a network interface 63, but it should be understood that it is not required to implement all the illustrated components, and alternative implementations are possible. More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。Computer equipment can be computing equipment such as desktop computers, notebooks, palmtop computers, and cloud servers. The computer equipment can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
存储器61至少包括一种类型的可读存储介质,可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器61可以是计算机设备6的内部存储单元,例如该计算机设备6的硬盘或内存。在另一些实施例中,存储器61也可以是计算机设备6的外部存储设备,例如该计算机设备6上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器61还可以既包括计算机设备6的内部存储单元也包括其外部存储设备。本实施例中,存储器61通常用于存储安装于计算机设备6的操作系统和各类应用软件,例如众包任务的抽检方法的计算机可读指令等。此外,存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 61 includes at least one type of readable storage medium. The readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash memory card (Flash Card) and so on. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store an operating system and various application software installed in the computer device 6, such as computer readable instructions for a crowdsourced task sampling method. In addition, the memory 61 may also be used to temporarily store various types of data that have been output or will be output.
处理器62在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制计算机设备6的总体操作。本实施例中,处理器62用于运行存储器61中存储的计算机可读指令或者处理数据,例如运行一种众包任务的抽检方法的计算机可读指令。The processor 62 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 62 is generally used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to run computer-readable instructions or process data stored in the memory 61, for example, a computer-readable instruction to run a crowdsourced task sampling method.
网络接口63可包括无线网络接口或有线网络接口,该网络接口63通常用于在计算机设备6与其他电子设备之间建立通信连接。The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。计算机可读存储介质存储有抽检流程,抽检流程可被至少一个处理器执行,以使至少一个处理器执行如上述的一种众包任务的抽检方法的步骤。This application also provides another implementation manner, that is, a computer-readable storage medium is provided. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores a sampling inspection process, and the sampling inspection process can be executed by at least one processor, so that the at least one processor executes the steps of the aforementioned crowdsourced task sampling method.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method of each embodiment of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the embodiments described above are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although this application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in each of the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims (20)

  1. 一种众包任务的抽检方法,包括:A sampling method for crowdsourcing tasks, including:
    针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
    对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;
    针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;
    根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
    按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
  2. 根据权利要求1所述众包任务的抽检方法,其中,所述对所述应答答案进行解析处理,得到抽检分词包括:The sampling method for crowdsourcing tasks according to claim 1, wherein said performing analysis processing on said response answer to obtain sampling word segmentation comprises:
    使用动态规划算法,对所述应答答案进行分词处理,得到初始分词;Use a dynamic programming algorithm to perform word segmentation processing on the response answer to obtain an initial word segmentation;
    对所述初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到所述抽检分词。Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the sampled word segmentation.
  3. 根据权利要求1所述众包任务的抽检方法,其中,所述从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库包括:The sampling method for crowdsourcing tasks according to claim 1, wherein said extracting sampling keywords from said sampling segmentation and storing said sampling keywords in a preset answer vocabulary comprises:
    针对同一所述历史众包任务,获取所述历史众包任务对应的所有抽检分词,并统计每个所述抽检分词的出现次数,得到所述抽检分词对应的抽检词频;For the same historical crowdsourcing task, obtain all the random test segmentation words corresponding to the historical crowdsourcing task, and count the number of occurrences of each test segmentation word to obtain the random test word frequency corresponding to the random test segmentation word;
    将所述抽检词频大于预设词频的抽检分词,作为抽检关键字,并将所述抽检关键字存入到预设的答案词库。The sampled word segmentation whose frequency of the sampled word is greater than the frequency of the preset word is used as the sampled keyword, and the sampled keyword is stored in the preset answer word database.
  4. 根据权利要求2所述众包任务的抽检方法,其中,所述对过滤处理后的初始分词进行同义替换,得到所述抽检分词包括:The sampling method for crowdsourcing tasks according to claim 2, wherein said synonymously replacing the initial word segmentation after filtering processing to obtain said sampling word segmentation comprises:
    通过命名实体识别的方式,对所述初始分词进行同义替换,得到所述抽检分词。By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the sampled word segmentation.
  5. 根据权利要求4所述众包任务的抽检方法,其中,所述通过命名实体识别的方式,对所述初始分词进行同义替换包括:The sampling method for crowdsourcing tasks according to claim 4, wherein said synonymously replacing said initial word segmentation by means of named entity recognition comprises:
    获取预设的标准词汇字典;Obtain a preset standard vocabulary dictionary;
    针对每个所述初始分词,通过遍历的方式,将所述初始分词分别与所述标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;For each of the initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result;
    若所述实体识别结果为存在相同命名实体,则获取所述识别结果对应的初始分词和标准词汇,并使用所述标准分词替代所述初始分词。If the entity recognition result is that the same named entity exists, the initial word segmentation and standard vocabulary corresponding to the recognition result are obtained, and the standard word segmentation is used to replace the initial word segmentation.
  6. 根据权利要求1至5任一项所述众包任务的抽检方法,其中,所述针对所述每个应答对象的每个应答答案中的所述抽检分词,统计预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数包括:The sampling method for crowdsourcing tasks according to any one of claims 1 to 5, wherein the sampling word segmentation in each response answer of each of the response objects is counted in a preset answer vocabulary, The number of times that the sampled word segmentation hits the sampled keywords, as the basic number of times corresponding to each response answer, includes:
    获取所述应答对象参与的每个历史众包任务,作为参考任务;Obtain each historical crowdsourcing task that the respondent has participated in as a reference task;
    针对每个所述参考任务,从所述预设的答案词库中获取抽检关键字,并统计所述应答对象的应答答案对应的抽检分词命中所述抽检关键字的次数,作为基础次数。For each of the reference tasks, sampling keywords are obtained from the preset answer vocabulary, and the number of times the sampling word corresponding to the answer of the response object hits the sampling keywords is counted as the basic frequency.
  7. 根据权利要求1至5任一项所述众包任务的抽检方法,其中,所述根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值包括:The sampling method for crowdsourcing tasks according to any one of claims 1 to 5, wherein the determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer comprises:
    根据所述基础次数,统计每个所述参考任务对应的抽检关键字之和M,并统计所述应答对象对于每个所述参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;According to the basic times, count the sum M of random check keywords corresponding to each of the reference tasks, and count the sum N of the number of hits by the response object for each of the reference tasks, where M and N are both positive integers , And N is less than or equal to M;
    采用公式δ用公式δ进行计算,得到所述可靠性值δ。Use the formula δ to calculate with the formula δ to obtain the reliability value δ.
  8. 一种众包任务的抽检装置,包括:A sampling device for crowdsourcing tasks, including:
    获取模块,用于针对每个历史众包任务,获取每个应答对象对应所述历史众包任务的应答答案;The obtaining module is used to obtain, for each historical crowdsourcing task, the answer of each response object corresponding to the historical crowdsourcing task;
    解析模块,用于对所述应答答案进行解析处理,得到抽检分词,并提取所述抽检分词的抽检关键字,将所述抽检关键字存入到预设的答案词库;The parsing module is used to analyze and process the response answers to obtain sampling word segmentation, extract sampling keywords of the sampling word segmentation, and store the sampling keywords in a preset answer vocabulary;
    统计模块,用于针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;The statistics module is used to count the number of times the sampled word segmentation hits the sampled keyword in the preset answer word database for the sampled word segmentation in each answer of each response object, as each The basic times corresponding to each answer;
    确定模块,用于根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;The determination module is used to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
    选取模块,用于按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。The selection module is used to select a preset number of response objects as sampling objects in the descending order of the reliability value, and perform a check operation on the response answers corresponding to the sampling objects.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的众包任务的抽检方法的步骤:A computer device, comprising a memory and a processor, the memory stores computer readable instructions running on the processor, and the processor implements the following crowdsourcing tasks when the processor executes the computer readable instructions The steps of the sampling method:
    针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
    对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;
    针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;
    根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
    按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
  10. 根据权利要求9所述的计算机设备,其中,所述对所述应答答案进行解析处理,得到抽检分词包括:9. The computer device according to claim 9, wherein said performing analysis processing on said response answer to obtain random word segmentation comprises:
    使用动态规划算法,对所述应答答案进行分词处理,得到初始分词;Use a dynamic programming algorithm to perform word segmentation processing on the response answer to obtain an initial word segmentation;
    对所述初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到所述抽检分词。Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the sampled word segmentation.
  11. 根据权利要求9所述的计算机设备,其中,所述从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库包括:9. The computer device according to claim 9, wherein said extracting the randomized keywords from the randomized word segmentation and storing the randomized keywords in a preset answer vocabulary comprises:
    针对同一所述历史众包任务,获取所述历史众包任务对应的所有抽检分词,并统计每个所述抽检分词的出现次数,得到所述抽检分词对应的抽检词频;For the same historical crowdsourcing task, obtain all the spot check segmentation words corresponding to the historical crowdsourcing task, and count the number of occurrences of each spot check word segmentation to obtain the spot check word frequency corresponding to the spot check segmentation word;
    将所述抽检词频大于预设词频的抽检分词,作为抽检关键字,并将所述抽检关键字存入到预设的答案词库。The sampled word segmentation whose frequency of the sampled word is greater than the frequency of the preset word is used as the sampled keyword, and the sampled keyword is stored in the preset answer word database.
  12. 根据权利要求10所述的计算机设备,其中,所述对过滤处理后的初始分词进行同义替换,得到所述抽检分词包括:10. The computer device according to claim 10, wherein said performing synonymous substitution on the filtered initial word segmentation to obtain the sampled word segmentation comprises:
    通过命名实体识别的方式,对所述初始分词进行同义替换,得到所述抽检分词。By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the sampled word segmentation.
  13. 根据权利要求12所述的计算机设备,其中,所述通过命名实体识别的方式,对所述初始分词进行同义替换包括:The computer device according to claim 12, wherein said synonymously replacing said initial word segmentation by means of named entity recognition comprises:
    获取预设的标准词汇字典;Obtain a preset standard vocabulary dictionary;
    针对每个所述初始分词,通过遍历的方式,将所述初始分词分别与所述标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;For each of the initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result;
    若所述实体识别结果为存在相同命名实体,则获取所述识别结果对应的初始分词和标准词汇,并使用所述标准分词替代所述初始分词。If the entity recognition result is that the same named entity exists, the initial word segmentation and standard vocabulary corresponding to the recognition result are obtained, and the standard word segmentation is used to replace the initial word segmentation.
  14. 根据权利要求9至13任一项所述的计算机设备,其中,所述针对所述每个应答对象的每个应答答案中的所述抽检分词,统计预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数包括:The computer device according to any one of claims 9 to 13, wherein the sampled word segmentation in each answer of each response object is counted in a preset answer vocabulary, the sampled word The number of times the word segmentation hits the selected keywords, as the basic times corresponding to each answer, includes:
    获取所述应答对象参与的每个历史众包任务,作为参考任务;Obtain each historical crowdsourcing task that the respondent has participated in as a reference task;
    针对每个所述参考任务,从所述预设的答案词库中获取抽检关键字,并统计所述应答对象的应答答案对应的抽检分词命中所述抽检关键字的次数,作为基础次数;For each of the reference tasks, obtain sampling keywords from the preset answer word database, and count the number of times that the random word segmentation corresponding to the answer of the response object hits the sampling keywords as the basic frequency;
    所述根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值包括:The determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer includes:
    根据所述基础次数,统计每个所述参考任务对应的抽检关键字之和M,并统计所述应答对象对于每个所述参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;According to the basic times, count the sum M of random check keywords corresponding to each of the reference tasks, and count the sum N of the number of hits by the response object for each of the reference tasks, where M and N are both positive integers , And N is less than or equal to M;
    采用公式δ用公式δ进行计算,得到所述可靠性值δ。Use the formula δ to calculate with the formula δ to obtain the reliability value δ.
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述众包任务的抽检方法的步骤:A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the steps of the sampling method for crowdsourcing tasks as described below are realized:
    针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task and the response answer corresponding to each response object;
    对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing the response answer to obtain random test segmentation words, extracting the random test keywords from the random test segmentation words, and storing the random test keywords in a preset answer vocabulary;
    针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampled word segmentation in each response answer of each respondent, count the number of times the sampled word segmentation in the preset answer word database hits the sampled keyword as the basis for each answer answer frequency;
    根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer;
    按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability value from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述对所述应答答案进行解析处理,得到抽检分词包括:15. The computer-readable storage medium according to claim 15, wherein the parsing processing of the response answer to obtain random word segmentation comprises:
    使用动态规划算法,对所述应答答案进行分词处理,得到初始分词;Use a dynamic programming algorithm to perform word segmentation processing on the response answer to obtain an initial word segmentation;
    对所述初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到所述抽检分词。Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the sampled word segmentation.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库包括:15. The computer-readable storage medium according to claim 15, wherein said extracting sampling keywords from said sampling word segmentation and storing said sampling keywords in a preset answer vocabulary comprises:
    针对同一所述历史众包任务,获取所述历史众包任务对应的所有抽检分词,并统计每个所述抽检分词的出现次数,得到所述抽检分词对应的抽检词频;For the same historical crowdsourcing task, obtain all the random test segmentation words corresponding to the historical crowdsourcing task, and count the number of occurrences of each test segmentation word to obtain the random test word frequency corresponding to the random test segmentation word;
    将所述抽检词频大于预设词频的抽检分词,作为抽检关键字,并将所述抽检关键字存入到预设的答案词库。The sampled word segmentation whose frequency of the sampled word is greater than the frequency of the preset word is used as the sampled keyword, and the sampled keyword is stored in the preset answer word database.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述对过滤处理后的初始分词进行同义替换,得到所述抽检分词包括:15. The computer-readable storage medium according to claim 16, wherein the synonymous replacement of the initial word segmentation after the filtering process to obtain the sampled word segmentation comprises:
    通过命名实体识别的方式,对所述初始分词进行同义替换,得到所述抽检分词。By means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the sampled word segmentation.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述通过命名实体识别的方式,对所述初始分词进行同义替换包括:18. The computer-readable storage medium according to claim 18, wherein said synonymously replacing said initial word segmentation by means of named entity recognition comprises:
    获取预设的标准词汇字典;Obtain a preset standard vocabulary dictionary;
    针对每个所述初始分词,通过遍历的方式,将所述初始分词分别与所述标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;For each of the initial word segmentation, perform named entity recognition on the initial word segmentation with each vocabulary in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result;
    若所述实体识别结果为存在相同命名实体,则获取所述识别结果对应的初始分词和标准词汇,并使用所述标准分词替代所述初始分词。If the entity recognition result is that the same named entity exists, the initial word segmentation and standard vocabulary corresponding to the recognition result are obtained, and the standard word segmentation is used to replace the initial word segmentation.
  20. 根据权利要求15至19任一项所述的计算机可读存储介质,其中,所述针对所述每个应答对象的每个应答答案中的所述抽检分词,统计预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数包括:The computer-readable storage medium according to any one of claims 15 to 19, wherein the random word segmentation in each response answer of each response object is counted in a preset answer vocabulary, The number of times that the sampled word segmentation hits the sampled keywords, as the basic number of times corresponding to each response answer, includes:
    获取所述应答对象参与的每个历史众包任务,作为参考任务;Obtain each historical crowdsourcing task that the respondent has participated in as a reference task;
    针对每个所述参考任务,从所述预设的答案词库中获取抽检关键字,并统计所述应答对象的应答答案对应的抽检分词命中所述抽检关键字的次数,作为基础次数;For each of the reference tasks, obtain sampling keywords from the preset answer word database, and count the number of times that the random word segmentation corresponding to the answer of the response object hits the sampling keywords as the basic frequency;
    所述根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值包括:The determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer includes:
    根据所述基础次数,统计每个所述参考任务对应的抽检关键字之和M,并统计所述应答对象对于每个所述参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;According to the basic times, count the sum M of random check keywords corresponding to each of the reference tasks, and count the sum N of the number of hits by the response object for each of the reference tasks, where M and N are both positive integers , And N is less than or equal to M;
    采用公式δ用公式δ进行计算,得到所述可靠性值δ。Use the formula δ to calculate with the formula δ to obtain the reliability value δ.
PCT/CN2020/118461 2020-03-02 2020-09-28 Crowdsourced task inspection method, apparatus, computer device, and storage medium WO2021174829A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010134385.6 2020-03-02
CN202010134385.6A CN111460810A (en) 2020-03-02 2020-03-02 Crowd-sourced task spot check method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021174829A1 true WO2021174829A1 (en) 2021-09-10

Family

ID=71679970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118461 WO2021174829A1 (en) 2020-03-02 2020-09-28 Crowdsourced task inspection method, apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN111460810A (en)
WO (1) WO2021174829A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116137073A (en) * 2023-04-19 2023-05-19 北京国电通网络技术有限公司 Remote intelligent selective examination method for electric power materials and equipment materials, electronic equipment and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium
CN112765985B (en) * 2021-01-13 2023-10-27 中国科学技术信息研究所 Named entity identification method for patent embodiments in specific fields
CN113486246B (en) * 2021-07-26 2024-07-12 平安科技(深圳)有限公司 Information searching method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052061A1 (en) * 2006-08-25 2008-02-28 Kim Young Kil Domain-adaptive portable machine translation device for translating closed captions using dynamic translation resources and method thereof
CN103455535A (en) * 2013-05-08 2013-12-18 深圳市明唐通信有限公司 Method for establishing knowledge base based on historical consultation data
CN105117398A (en) * 2015-06-25 2015-12-02 扬州大学 Software development problem automatic answering method based on crowdsourcing
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052061A1 (en) * 2006-08-25 2008-02-28 Kim Young Kil Domain-adaptive portable machine translation device for translating closed captions using dynamic translation resources and method thereof
CN103455535A (en) * 2013-05-08 2013-12-18 深圳市明唐通信有限公司 Method for establishing knowledge base based on historical consultation data
CN105117398A (en) * 2015-06-25 2015-12-02 扬州大学 Software development problem automatic answering method based on crowdsourcing
CN110196901A (en) * 2019-06-28 2019-09-03 北京百度网讯科技有限公司 Construction method, device, computer equipment and the storage medium of conversational system
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116137073A (en) * 2023-04-19 2023-05-19 北京国电通网络技术有限公司 Remote intelligent selective examination method for electric power materials and equipment materials, electronic equipment and medium

Also Published As

Publication number Publication date
CN111460810A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN110692050B (en) Adaptive evaluation of primitive relationships in semantic graphs
CN108897867B (en) Data processing method, device, server and medium for knowledge question answering
Gu et al. " what parts of your apps are loved by users?"(T)
CN109522551B (en) Entity linking method and device, storage medium and electronic equipment
CN106940788B (en) Intelligent scoring method and device, computer equipment and computer readable medium
US9558264B2 (en) Identifying and displaying relationships between candidate answers
WO2021174829A1 (en) Crowdsourced task inspection method, apparatus, computer device, and storage medium
JP7153004B2 (en) COMMUNITY Q&A DATA VERIFICATION METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
US8577884B2 (en) Automated analysis and summarization of comments in survey response data
WO2019118007A1 (en) Domain-specific natural language understanding of customer intent in self-help
US20150142423A1 (en) Phrase-based data classification system
US9535980B2 (en) NLP duration and duration range comparison methodology using similarity weighting
TWI643076B (en) Financial analysis system and method for unstructured text data
WO2019232893A1 (en) Method and device for text emotion analysis, computer apparatus and storage medium
CN112699645B (en) Corpus labeling method, apparatus and device
WO2021169485A1 (en) Dialogue generation method and apparatus, and computer device
CN112784591B (en) Data processing method and device, electronic equipment and storage medium
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
US20240070188A1 (en) System and method for searching media or data based on contextual weighted keywords
US20230070966A1 (en) Method for processing question, electronic device and storage medium
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN110717029A (en) Information processing method and system
CN114625960A (en) On-line evaluation method and device, electronic equipment and storage medium
CN115168577B (en) Model updating method and device, electronic equipment and storage medium
CN114003693A (en) Question answering method, model training method, equipment and program product thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923048

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923048

Country of ref document: EP

Kind code of ref document: A1