CN107705034B

CN107705034B - Crowdsourcing platform implementation method and device, storage medium and electronic equipment

Info

Publication number: CN107705034B
Application number: CN201711014277.XA
Authority: CN
Inventors: 贾宇楠
Original assignee: Yidu Cloud Beijing Technology Co Ltd
Current assignee: Yidu Cloud Beijing Technology Co Ltd
Priority date: 2017-10-26
Filing date: 2017-10-26
Publication date: 2021-06-29
Anticipated expiration: 2037-10-26
Also published as: CN107705034A

Abstract

The invention discloses a method and a device for realizing a crowdsourcing platform, a storage medium and electronic equipment, and relates to the technical field of data processing. The crowdsourcing platform implementation method comprises the following steps: acquiring a data set to be marked which is sent by a publisher and subjected to standardized processing by a transfer warehouse; dividing the data set to be annotated into data subsets; sending the data subset to a user side for labeling by a labeling person; receiving the marked data sent by the user side; and comparing the labeled data of a plurality of labeling personnel labeling the same data subset, and performing performance statistics on the plurality of labeling personnel according to the comparison result. The method and the device can realize the automatic distribution process of the crowdsourcing platform data, and can solve the problem of inconsistent data formats of the marked data.

Description

Crowdsourcing platform implementation method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a crowdsourcing platform implementation method, a crowdsourcing platform implementation apparatus, a storage medium, and an electronic device.

Background

With the advent of the big data age, more and more data are awakened, discovered and used. However, these data have the characteristics of non-standardization, complication and diversification, which results in that these data may not be automatically recognized by the computer, and the invalid data is discarded, so that the use value of the data is greatly reduced. In order to improve the data quality and mine the data value, the data needs to be classified, sorted, edited, corrected, annotated and the like manually.

Firstly, a data annotation task needs to be manually split into a plurality of subtasks and distributed to a plurality of annotating personnel; then, each marking person selects a corresponding single edition marking tool for marking according to the data type of the data in the received subtasks; next, the data labeled by each labeling person needs to be integrated and stored.

On one hand, however, the annotation data is not normalized, so that the annotation result data of a plurality of annotation personnel may be inconsistent, and the control on the data quality is lacked; on the other hand, the data splitting and other processes need human intervention, and the efficiency is low.

In view of this, a crowdsourcing platform implementation method, a crowdsourcing platform implementation apparatus, a storage medium, and an electronic device are needed.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide a crowdsourcing platform implementation method, a crowdsourcing platform implementation apparatus, a storage medium, and an electronic device, thereby overcoming, at least to some extent, one or more of the problems due to the limitations and disadvantages of the related art.

According to an aspect of the present disclosure, there is provided a method for implementing a crowdsourcing platform, including:

acquiring a data set to be marked which is sent by a publisher and subjected to standardized processing by a transfer warehouse;

dividing the data set to be annotated into data subsets;

sending the data subset to a user side for labeling by a labeling person;

receiving the marked data sent by the user side; and

and comparing the labeled data of a plurality of labeling personnel labeling the same data subset, and performing performance statistics on the plurality of labeling personnel according to the comparison result.

In an exemplary embodiment of the present disclosure, comparing the labeled data of a plurality of labeling personnel labeling the same data subset and performing performance statistics on the plurality of labeling personnel according to the comparison result comprises:

determining the difference content among the labeled data of a plurality of labeling personnel for labeling the same data subset;

providing the data to be labeled corresponding to the difference content to an adjudicator, and labeling the data to be labeled corresponding to the difference content by the adjudicator to obtain a standard answer corresponding to the difference content;

and carrying out performance statistics on the plurality of marking personnel according to the standard answers.

In an exemplary embodiment of the present disclosure, the crowdsourcing platform implementation method further includes:

and sending the same content of the labeled data of the plurality of labeling personnel and the standard answers to the transfer warehouse for the publisher to obtain.

obtaining annotation suggestions for the subset of data;

and when the data subset is sent to a user side, the annotation suggestion is sent to the user side so that the annotation personnel can obtain the annotation suggestion.

In an exemplary embodiment of the disclosure, obtaining annotation suggestions for the subset of data comprises:

acquiring a standard data set;

determining annotation suggestions for the subset of data based on the standard dataset;

and acquiring the annotation suggestion.

and when the data subset is sent to a user side, obtaining a labeling rule aiming at the data subset to control the user side to render a labeling interface based on the labeling rule.

According to an aspect of the present disclosure, there is provided a crowdsourcing platform implementation apparatus, including:

the data set acquisition module is used for acquiring a data set to be labeled which is sent by a publisher and is subjected to standardized processing by a transfer warehouse;

the data set dividing module is used for dividing the data set to be labeled into data subsets;

the data subset sending module is used for sending the data subset to a user side for labeling by a labeling person;

the data receiving module is used for receiving the marked data sent by the user side; and

and the performance statistics module is used for comparing the labeled data of a plurality of labeling personnel labeling the same data subset and performing performance statistics on the plurality of labeling personnel according to the comparison result.

In an exemplary embodiment of the present disclosure, the performance statistics module includes:

the difference content determining unit is used for determining the difference content among the labeled data of a plurality of labeling personnel for labeling the same data subset;

the system comprises a difference content sending unit, a judging unit and a judging unit, wherein the difference content sending unit is used for providing data to be labeled corresponding to the difference content to a judging person, and the judging person labels the data to be labeled corresponding to the difference content to obtain a standard answer corresponding to the difference content;

and the performance statistics unit is used for carrying out performance statistics on the plurality of marking personnel according to the standard answers.

In an exemplary embodiment of the present disclosure, the crowdsourcing platform implementation apparatus further includes:

and the data sending module is used for sending the same contents of the marked data of the plurality of marking personnel and the standard answers to the transfer warehouse for the publisher to obtain.

the annotation suggestion acquisition module is used for acquiring annotation suggestions aiming at the data subsets;

and the annotation suggestion sending module is used for sending the annotation suggestion to the user side when the data subset is sent to the user side so that the annotation personnel can obtain the annotation suggestion.

In an exemplary embodiment of the present disclosure, the annotation suggestion acquisition module includes:

a standard data set acquisition unit for acquiring a standard data set;

an annotation suggestion determination unit for determining an annotation suggestion for the subset of data based on the standard dataset;

and the annotation suggestion acquisition unit is used for acquiring the annotation suggestion.

and the marking rule obtaining module is used for obtaining a marking rule aiming at the data subset when the data subset is sent to a user side so as to control the user side to render a marking interface based on the marking rule.

According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the crowdsourcing platform implementation method of any one of the above.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any one of the above crowdsourcing platform implementation methods via execution of the executable instructions.

In the technical scheme provided by some embodiments of the present disclosure, the data set to be labeled after normalized processing by the transfer warehouse is divided into data subsets, the data subsets are sent to the user side for labeling by a labeling person, the labeled data of a plurality of labeling persons labeling the same data subset are compared, and performance statistics is performed on the plurality of labeling persons according to the comparison result, so that on one hand, an automatic data distribution process can be realized without manual participation, and the work efficiency is improved; on the other hand, the marked data of a plurality of marking personnel for marking the same data subset are compared, and performance statistics is carried out on the marking personnel according to the comparison result, so that the process of manually counting the performance of the marking personnel is omitted; on the other hand, the data is subjected to standardized processing in advance by adopting a transfer warehouse, so that the condition that the data formats are inconsistent is avoided.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 schematically illustrates a flow diagram of a crowdsourcing platform implementation method according to an exemplary embodiment of the disclosure;

fig. 2 shows a schematic diagram of an architecture corresponding to a crowdsourcing platform implementation method according to an exemplary embodiment of the disclosure;

FIG. 3 schematically illustrates a block diagram of a crowdsourcing platform implementation apparatus according to an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a block diagram of a performance statistics module, according to an exemplary embodiment of the present disclosure;

FIG. 5 schematically illustrates another block diagram of a crowdsourcing platform implementing apparatus, according to an exemplary embodiment of the present disclosure;

FIG. 6 schematically illustrates yet another block diagram of a crowdsourcing platform implementing apparatus, according to an exemplary embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of an annotation suggestion acquisition module according to an exemplary embodiment of the present disclosure;

FIG. 8 schematically illustrates yet another block diagram of a crowdsourcing platform implementation apparatus, according to an exemplary embodiment of the present disclosure;

FIG. 9 shows a schematic diagram of a storage medium according to an example embodiment of the present disclosure; and

fig. 10 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 schematically illustrates a flow chart of a crowdsourcing platform implementation method of an exemplary embodiment of the present disclosure. Referring to fig. 1, the crowdsourcing platform implementation method may include the following steps:

s10, acquiring a data set to be marked, which is sent by a publisher and subjected to standardized processing by a transfer warehouse;

s12, dividing the data set to be labeled into data subsets;

s14, sending the data subset to a user side for labeling by a labeling person;

s16, receiving the marked data sent by the user side; and

and S18, comparing the marked data of a plurality of marking personnel for marking the same data subset, and performing performance statistics on the plurality of marking personnel according to the comparison result.

In the crowdsourcing platform implementation method of the exemplary embodiment of the disclosure, a data set to be labeled after normalized processing by a relay warehouse is divided into data subsets, the data subsets are sent to a user side for labeling by a labeling person, labeled data of a plurality of labeling persons labeling the same data subset are compared, and performance statistics is performed on the plurality of labeling persons according to a comparison result, so that on one hand, an automatic data distribution process can be realized, manual participation is not needed, and the work efficiency is improved; on the other hand, the marked data of a plurality of marking personnel for marking the same data subset are compared, and performance statistics is carried out on the marking personnel according to the comparison result, so that the process of manually counting the performance of the marking personnel is omitted; on the other hand, the data is subjected to standardized processing in advance by adopting a transfer warehouse, so that the condition that the data formats are inconsistent is avoided.

The following describes each step of the crowdsourcing platform implementation method of the present disclosure in detail.

In step S10, the data set to be annotated sent by the publisher and normalized by the transit warehouse is obtained.

First, a publisher may determine a data set that needs to be labeled according to its own requirements, and send the data set to a relay warehouse through a publisher's terminal (e.g., a computer), where data in the data set may include picture data, text data, table data, video data, and the like, which is not particularly limited in this exemplary embodiment. In addition, the relay warehouse described in the present disclosure may be a data processing unit having a storage function, and the relay warehouse may be configured with one or more interfaces that satisfy the publisher data transmission protocol.

Next, the staging warehouse may perform normalization processing on the received data set. Specifically, the data format from the publisher may be converted into a data format that is uniformly processed by the crowdsourcing platform, for example, data in a picture format may be converted into data in a binary format. On one hand, a system developer can compile data conversion codes by himself and store the data conversion codes in a transfer warehouse, when a data set is received, the transfer warehouse can judge whether the data format of the data set is the same as the data format uniformly processed by a crowdsourcing platform, and if the data format of the data set is not the same as the data format uniformly processed by the crowdsourcing platform, an instruction of operating the data conversion codes is triggered to convert the data format of the data set into the data format uniformly processed by the crowdsourcing platform; on the other hand, the transit warehouse may be integrated with a commercially available data conversion module to convert the data format sent by the publisher. In an embodiment of the present disclosure, the data set to be annotated may include a data set subjected to the normalization processing by the transit warehouse and/or a data set judged by the transit warehouse that the data format of the publisher is the same as the data format of the crowdsourcing platform.

According to some embodiments of the disclosure, after determining the data set to be annotated, the transfer warehouse may send the data set to be annotated to the crowdsourcing platform. In addition, according to some other embodiments, the crowdsourcing platform may monitor whether the to-be-annotated data set exists in the relay warehouse in real time, and when it is determined that the to-be-annotated data set exists in the relay warehouse, may send a request for obtaining the to-be-annotated data set to the relay warehouse, and then, the relay warehouse may send the to-be-annotated data set to the crowdsourcing platform in response to the request.

According to the invention, by configuring the transfer warehouse, on one hand, the data format of the publisher can be converted into the data format uniformly processed by the crowdsourcing platform, so that the problem of contradiction between the data diversity of the publisher and the data normalization of the crowdsourcing platform is solved, and the unified data format after marking is ensured, and is convenient to integrate; on the other hand, the transit warehouse is configured between the issuing party and the crowdsourcing platform, so that the crowdsourcing platform can be prevented from being directly exposed to the issuing party, and the data security is improved.

And S12, dividing the data set to be labeled into data subsets.

According to some embodiments of the present disclosure, a crowdsourcing platform may equally divide a set of data to be annotated into subsets of data. The number of the equal divisions can be adapted to the number of the annotating persons, for example, if the annotating person is 100 persons, the data to be annotated can be equally divided into 100 data subsets.

According to other embodiments, the crowdsourcing platform may further divide the data to be annotated according to a data type in the data set to determine the data subset, for example, data of the character picture and data of the enterprise report belong to different data types. In addition, data types corresponding to the data can be configured in advance, after the crowdsourcing platform receives the data to be annotated, the crowdsourcing platform can firstly determine the data types contained in the data to be annotated, and then the data to be annotated can be divided according to different data types.

It will be readily appreciated that the concept of halving may be combined with the concept of partitioning by data type according to further embodiments. For example, when data to be labeled has three types of data, for example, the crowdsourcing platform may perform 100 equal divisions on the three types of data respectively to obtain 300 data subsets, and then allocate the 300 data subsets to 100 labeling personnel to ensure that each labeling personnel can label the three types of data once.

In addition, the crowdsourcing platform can also divide the data set to be annotated based on the existing data set division means, and the means can include k-fold cross validation, leave-one-out method, random division method and the like. In addition, the developer may also define the partition rule by itself according to the business needs, and this is not particularly limited in this exemplary embodiment.

And S14, sending the data subset to a user side for labeling by a labeling person.

In an exemplary embodiment of the present disclosure, the user terminal may be a terminal for the annotating personnel to perform the annotating operation, and may include a PC terminal and/or a mobile terminal such as a mobile phone.

On one hand, the annotating personnel can send a request for acquiring the data subset to be annotated to the crowdsourcing platform through the user terminal, and the crowdsourcing platform can respond to the request and send the data subset to the user terminal for the annotating personnel to annotate; on the other hand, the crowdsourcing platform can predetermine the path of the user side, and after the data subset is divided, the data subset is sent to each user side based on the predetermined path so as to be labeled by the labeling personnel.

In addition, the present disclosure does not specifically limit the data transmission protocol between the crowdsourcing platform and the user side, for example, the data transmission protocol may be an https protocol based on security considerations.

According to some embodiments of the disclosure, the crowdsourcing platform may obtain the annotation suggestion for the data subset, and when sending the data subset to the user side, send the annotation suggestion to the user side, and then, the annotator may view the annotation suggestion and perform an annotation process with the annotation suggestion as a reference.

Firstly, a crowdsourcing platform can obtain a standard data set, wherein the labeled data set can be manually and repeatedly determined, and can be considered to meet labeling requirements; next, the crowdsourcing platform may determine annotation suggestions for the data subsets based on the standard data set, specifically, may determine annotation suggestions based on the standard data set in combination with a training process of machine learning, and may further combine third party data in the process of machine learning, where the third party data may include, but is not limited to, historical annotation suggestions of publishers, publisher annotation preferences analyzed by big data, and other suggestive data; subsequently, the crowdsourcing platform can obtain the annotation suggestion and send the annotation suggestion to the user side.

In addition, the crowd-sourcing platform implementation method can further comprise the step of determining the annotation suggestion by means of a machine-assisted annotation service platform, and the crowd-sourcing platform can acquire the annotation suggestion from the machine-assisted annotation service platform. The functions of the machine-assisted annotation service platform can include, but are not limited to, a normalization service, a structuring service, a cleaning service, a data classification and annotation service, and the like.

By sending the annotation suggestion to the annotating personnel, the annotating personnel can conveniently and rapidly master the annotation mode, unnecessary errors are reduced, and the annotation efficiency is improved.

According to some embodiments of the disclosure, when the data subset is sent to the user side, the crowdsourcing platform may obtain the labeling rule for the data subset, and send the interface configuration instruction to the user side based on the labeling rule, and the user side may render an interface for the labeling personnel to perform data labeling in response to the interface configuration instruction, so as to standardize the operation mode of the labeling personnel. Specifically, the annotation rule may be a data annotation rule specified by the publisher, and the rule may include, but is not limited to, a format rule, an indication that the item should be annotated, and the like, wherein the indication that the item should be annotated may include, for example, that the item should be annotated in a null or selected manner. In addition, the interface of the user side may be, for example, a web page or a local interface of the user side, which is not particularly limited in this exemplary embodiment.

By acquiring the marking rule, the standardization of marking operation of marking personnel is facilitated to be improved, errors possibly occurring in business logic are avoided, and the reasonability of marked data is ensured.

And S16, receiving the marked data sent by the user side.

After the annotating personnel complete the annotation of the data subset of the data to be annotated, the annotating personnel can send the annotated data to the crowdsourcing platform through the user side. For example, the annotated data may be sent to a crowdsourcing platform via the https protocol.

In an exemplary embodiment of the present disclosure, in order to ensure the correctness of the labeled data, a way that a plurality of labeling personnel label the same data subset may be adopted. Specifically, after annotation is completed and received by the crowdsourcing platform, first, the crowdsourcing platform may determine the difference content between the annotated data of a plurality of annotating persons annotating the same data subset; then, the crowdsourcing platform can determine the data to be labeled corresponding to the difference contents and send the data to be labeled to a labeled adjudicator, and the adjudicator labels the data and takes the labeled data as a standard answer of the data to be labeled, wherein the adjudicator can be a person with higher label accuracy which is selected in advance, and the adjudicator can be one person or a plurality of persons; the adjudicator can then send the standard answers to the crowdsourcing platform; next, the crowdsourcing platform can perform performance statistics on the plurality of annotating personnel based on the standard answers.

It is easy to understand that when the data labeled by a labeling person has more contents different from the standard answers, the performance of the labeling person is low; when the data marked by another marking person is consistent with or approximately the same as the standard answer, the performance of the marking person is higher. In addition, the present disclosure does not limit the specific statistical process. In addition, the referee can designate a marker having a high performance for a certain period of time as the referee.

In addition, it should be noted that, on one hand, because the probability that the annotators annotate errors at the same time and the error types are the same is small, for the same data subset, the same annotated contents after the annotation is performed by a plurality of annotators can be considered to be correct; on the other hand, if the data marked by the plurality of marking personnel are the same or consistent, that is, there is no difference content, the crowdsourcing platform can directly determine that the data marked by the marking personnel are all correct, and under the condition that the data to be marked are more complex, the marking personnel can be considered to have higher service level.

The performance statistics process of the present disclosure may be the same as the performance statistics process in some technologies, and the present disclosure is not particularly limited thereto.

In addition, the crowdsourcing platform can send the standard answers corresponding to the difference contents to the user sides of the annotating personnel, and the annotating personnel can check out data with wrong annotation after learning the feedback results so as to help the annotating personnel improve the service level.

The crowdsourcing platform implementation method can further comprise the step that the crowdsourcing platform integrates the received marked data, and the specific integration process is not specially limited.

According to some embodiments of the disclosure, the crowdsourcing platform may further send the same content of the labeled data and the standard answer to the transit warehouse for the publisher to obtain.

In addition, since the data format of the publisher may not be consistent with the annotated data format, the relay warehouse may also be configured to convert the annotated data format into the data format of the publisher. Next, the staging warehouse may send the data to the publisher based on a data transmission protocol of the staging warehouse and the publisher.

In conclusion, the data annotation process based on the crowdsourcing platform is realized.

The architecture corresponding to the implementation method of the crowdsourcing platform of the present disclosure will be described with reference to fig. 2.

Firstly, the publisher 21 can send a data set to be labeled to the relay warehouse 22 according to its own requirements; subsequently, the transit warehouse 22 may perform normalization processing on the data set to be labeled, and send the processed data set to be labeled to the crowdsourcing platform 23; next, the crowdsourcing platform 23 may divide the data set to be annotated into data subsets and send the data subsets to the clients respectively, as shown in fig. 2, the clients may include, but are not limited to, a client 24, a client 25, and a client 29, and the number of clients is not particularly limited in the present disclosure; then, after each annotating person finishes the annotation operation on the data subset to be annotated, the annotated data can be sent to the crowdsourcing platform 23 through each user side; next, the crowdsourcing platform 23 may compare the annotated data of a plurality of annotating personnel annotating the same data subset and perform performance statistics on the plurality of annotating personnel according to the comparison result. In addition, the crowdsourcing platform 23 can also feed back the comparison result to the annotating personnel at each user end to help each annotating personnel improve the service capability. In addition, the crowdsourcing platform 23 may send the same content of the labeled data of a plurality of labeling personnel and the standard answers to the transit warehouse 22 for the publisher 21 to obtain.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, an apparatus for implementing a crowdsourcing platform is also provided in this example embodiment.

Fig. 3 schematically illustrates a block diagram of a crowdsourcing platform implementation apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 3, the crowdsourcing platform implementing device 3 according to the exemplary embodiment of the present disclosure may include a data set acquiring module 31, a data set dividing module 33, a data subset transmitting module 35, a data receiving module 37 and a performance statistics module 39, wherein:

the data set obtaining module 31 may be configured to obtain a data set to be annotated, which is sent by a publisher and is normalized by a transit warehouse;

a data set dividing module 33, configured to divide the data set to be annotated into data subsets;

the data subset sending module 35 may be configured to send the data subset to a user side for a labeling person to label;

the data receiving module 37 may be configured to receive the labeled data sent by the user side; and

the performance statistics module 39 may be configured to compare the labeled data of a plurality of labeling personnel labeling the same data subset, and perform performance statistics on the plurality of labeling personnel according to the comparison result.

In the crowdsourcing platform implementation device of the exemplary embodiment of the disclosure, on one hand, an automatic data distribution process can be realized without manual participation, and the working efficiency is improved; on the other hand, the marked data of a plurality of marking personnel for marking the same data subset are compared, and performance statistics is carried out on the marking personnel according to the comparison result, so that the process of manually counting the performance of the marking personnel is omitted; on the other hand, the data is subjected to standardized processing in advance by adopting a transfer warehouse, so that the condition that the data formats are inconsistent is avoided.

According to an exemplary embodiment of the present disclosure, referring to fig. 4, the performance statistics module 39 may include a difference content determination unit 401, a difference content transmission unit 403, and a performance statistics unit 405, wherein:

a difference content determining unit 401, configured to determine difference content between labeled data of a plurality of labeling persons labeling the same data subset;

a difference content sending unit 403, configured to provide data to be labeled corresponding to the difference content to an adjudicator, where the adjudicator labels the data to be labeled corresponding to the difference content to obtain a standard answer corresponding to the difference content;

and the performance statistics unit 405 may be configured to perform performance statistics on the plurality of annotating persons according to the standard answers.

The performance statistics is realized, meanwhile, the correctness of the labeled data returned to the publisher can be ensured by acquiring the standard answers, and the standard answers can be sent to the labeling personnel to obtain the service level of the labeling personnel.

According to an exemplary embodiment of the present disclosure, referring to fig. 5, compared to the crowdsourcing platform implementation apparatus 3, the crowdsourcing platform implementation apparatus 5 may further include a data sending module 51 in addition to the data set acquiring module 31, the data set dividing module 33, the data subset sending module 35, the data receiving module 37, and the performance statistics module 39, wherein:

the data sending module 51 may be configured to send the same content of the labeled data of the plurality of labeling personnel and the standard answer to the relay warehouse for the publisher to obtain.

By sending the marked data to the transfer warehouse, on one hand, the data format of the crowdsourcing platform can be converted into the data format of the publisher, and the problem of contradiction between the data diversity of the publisher and the data normalization of the crowdsourcing platform is solved; on the other hand, the transit warehouse is configured between the issuing party and the crowdsourcing platform, so that the crowdsourcing platform can be prevented from being directly exposed to the issuing party, and the data security is improved.

According to an exemplary embodiment of the present disclosure, referring to fig. 6, compared to the crowdsourcing platform implementation apparatus 3, the crowdsourcing platform implementation apparatus 6 may further include a labeling suggestion acquisition module 61 and a labeling suggestion transmission module 63, in addition to the data set acquisition module 31, the data set dividing module 33, the data subset transmission module 35, the data receiving module 37, and the performance statistics module 39, where:

an annotation suggestion acquisition module 61, operable to acquire an annotation suggestion for the data subset;

the annotation suggestion sending module 63 may be configured to send the annotation suggestion to the user side when the data subset is sent to the user side, so that the annotator can obtain the annotation suggestion.

According to an exemplary embodiment of the present disclosure, referring to fig. 7, the annotation suggestion acquisition module 61 may include a standard data set acquisition unit 701, an annotation suggestion determination unit 703 and an annotation suggestion acquisition unit 705; wherein:

a standard data set acquisition sheet 701, which can be used to acquire a standard data set;

an annotation suggestion determination unit 703, which may be configured to determine an annotation suggestion for the subset of data based on the standard dataset;

the annotation suggestion obtaining unit 705 may be configured to obtain the annotation suggestion.

According to an exemplary embodiment of the present disclosure, referring to fig. 8, compared to the crowdsourcing platform implementation apparatus 3, the crowdsourcing platform implementation apparatus 8 may further include an annotation rule obtaining module 81 in addition to the data set obtaining module 31, the data set dividing module 33, the data subset sending module 35, the data receiving module 37, and the performance statistics module 39, where:

the annotation rule obtaining module 81 may be configured to, when the data subset is sent to the user side, obtain an annotation rule for the data subset to control the user side to render an annotation interface based on the annotation rule.

It is to be readily understood that although the data transmission module 51 is described in the crowdsourcing platform implementation device 5, the data transmission module 51 may also be included in the crowdsourcing platform implementation device 6 and/or the crowdsourcing platform implementation device 8. Similarly, although the annotation suggestion acquisition module 61 and the annotation suggestion transmission module 63 are described in the crowdsourcing platform implementation device 6, the annotation suggestion acquisition module 61 and the annotation suggestion transmission module 63 may also be included in the crowdsourcing platform implementation device 5 and/or the crowdsourcing platform implementation device 8. Similarly, although the annotation rule obtaining module 81 is described in the crowdsourcing platform implementation device 8, the annotation rule obtaining module 81 may also be included in the crowdsourcing platform implementation device 5 and/or the crowdsourcing platform implementation device 6.

Since each functional module of the program operation performance analysis apparatus according to the embodiment of the present invention is the same as that in the embodiment of the present invention, it is not described herein again.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

Referring to fig. 9, a program product 900 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 1000 according to this embodiment of the invention is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting different system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.

Wherein the storage unit stores program code that is executable by the processing unit 1010 to cause the processing unit 1010 to perform steps according to various exemplary embodiments of the present invention as described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may execute step S10 shown in fig. 1: acquiring a data set to be marked which is sent by a publisher and subjected to standardized processing by a transfer warehouse; step S12: dividing the data set to be annotated into data subsets; step S14: sending the data subset to a user side for labeling by a labeling person; step S16: receiving the marked data sent by the user side; and step S18: and comparing the labeled data of a plurality of labeling personnel labeling the same data subset, and performing performance statistics on the plurality of labeling personnel according to the comparison result.

The storage unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)10201 and/or a cache memory unit 10202, and may further include a read-only memory unit (ROM) 10203.

The memory unit 1020 may also include a program/utility 10204 having a set (at least one) of program modules 10205, such program modules 10205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 1100 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A method for implementing a crowdsourcing platform, comprising:

the method comprises the steps that a data set to be marked, which is sent by a publisher and subjected to standardized processing by a transfer warehouse, is obtained, wherein the data set to be marked is obtained by converting a data format after the transfer warehouse judges that the data format of a received data set is different from the data format uniformly processed by a crowdsourcing platform, and/or is obtained by storing after the data format of the received data set is judged to be the same as the data format uniformly processed by the crowdsourcing platform;

dividing the data set to be annotated into data subsets;

sending the data subset to a user side for labeling by a labeling person;

receiving the marked data sent by the user side; and

providing the data to be labeled corresponding to the difference content to an adjudicator, and labeling the data to be labeled corresponding to the difference content by the adjudicator to obtain a standard answer corresponding to the difference content, wherein the adjudicator is one or more people with high labeling accuracy rate selected in advance;

performing performance statistics on the plurality of marked personnel according to the standard answers;

wherein the dividing the data set to be annotated into data subsets comprises:

dividing the data set to be marked into data subsets according to the number of marking personnel and/or the data type in the data set to be marked and/or a data set division algorithm;

wherein, will the data subset sends the user side in order to supply the marking personnel to mark, include:

acquiring a labeling suggestion aiming at the data subset, and when the data subset is sent to a user side, sending the labeling suggestion to the user side so that the labeling personnel can obtain the labeling suggestion;

wherein the annotation proposal can be determined based on a standard dataset and/or historical annotation proposals and/or annotation preferences in conjunction with a machine-learned training process.

2. The crowdsourcing platform implementation method of claim 1, wherein the crowdsourcing platform implementation method further comprises:

3. The crowdsourcing platform implementation method of claim 1, wherein obtaining annotation suggestions for the subset of data comprises:

acquiring a standard data set;

and acquiring the annotation suggestion.

4. The crowdsourcing platform implementation method of claim 1 or 3, wherein the crowdsourcing platform implementation method further comprises:

5. An apparatus for implementing a crowdsourcing platform, comprising:

the system comprises a data set acquisition module, a data set conversion module and a crowdsourcing platform, wherein the data set acquisition module is used for acquiring a data set to be annotated, which is sent by a publisher and subjected to standardized processing by a transfer warehouse, and converting the data format of the data set to be annotated after the data set to be annotated is judged by the transfer warehouse to be different from the data format uniformly processed by the crowdsourcing platform, and/or storing the data set to be annotated after the data format of the received data set is judged to be the same as the data format uniformly processed by the crowdsourcing platform;

the performance statistics module is used for comparing the labeled data of a plurality of labeling personnel labeling the same data subset and performing performance statistics on the plurality of labeling personnel according to the comparison result;

the performance statistics module includes:

the performance statistics unit is used for performing performance statistics on the plurality of marking personnel according to the standard answers, and the adjudicator is one or more people with high marking accuracy rate which are selected in advance;

wherein the data set partitioning module is configured to:

wherein the data subset transmitting module is configured to:

6. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the crowdsourcing platform implementation method of any one of claims 1 to 4.

7. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the crowdsourcing platform implementation of any one of claims 1 to 4 via execution of the executable instructions.