CN113469291B

CN113469291B - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN113469291B
Application number: CN202111022497.3A
Authority: CN
Inventors: 姜敏华
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-11-30
Anticipated expiration: 2041-09-01
Also published as: CN113469291A

Abstract

The application relates to the technical field of artificial intelligence, and provides a data processing method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a plurality of target annotators, a marking starting instruction and a marking ending instruction; determining a target test question set of each target annotator according to the annotation images of the plurality of target annotators; simultaneously sending corresponding target test question sets to a plurality of target annotators, and simultaneously collecting annotation results; performing label calculation on the labeling results of the plurality of target labels according to a preset inference algorithm to obtain a feedback result of each target label; and sending the feedback result of each target annotator to the corresponding target annotator client. According to the method and the device, the correct labeling answer of each test question to be labeled is deduced by adopting a preset deduction algorithm, the correct labeling answer is not determined directly according to the labeling result of the test question to be labeled, and the accuracy and the efficiency of the feedback result are improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

The high-quality labeled data set is a very important resource in the field of computer research and application, and when manual labeling is performed in the conventional labeling system, especially when a plurality of labels are required for the same corpus, the labeling time of the plurality of labels is inconsistent, so that the labeling result processing cannot be performed immediately, and the efficiency of feeding back the data processing result is low.

In addition, some public welfare labeling systems do not even give feedback results, and only the labeling workload needs to be completed, so that a annotator cannot know correct labeling answers and related associated knowledge, and the professional literacy of the annotator is slow to promote.

Disclosure of Invention

In view of the above, it is necessary to provide a data processing method, an apparatus, an electronic device, and a storage medium, in which a correct labeling answer of each test question to be labeled is inferred by using a preset inference algorithm, instead of a correct labeling answer determined directly according to a labeling result of the test question to be labeled, so as to improve accuracy and efficiency of a feedback result.

A first aspect of the present application provides a data processing method, the method including:

analyzing the received data annotation task request to obtain a plurality of target annotators, a marking starting instruction and a marking ending instruction;

determining a target test question set corresponding to each target annotator according to the annotation images of the plurality of target annotators;

responding to the marking starting instruction, simultaneously sending corresponding target test question sets to the target markers, and responding to the marking ending instruction, and simultaneously collecting marking results of the target markers;

performing label calculation on the labeling results of the plurality of target labels according to a preset inference algorithm to obtain a feedback result of each target label;

and sending the feedback result of each target annotator to the corresponding target annotator client.

Optionally, the tag calculation of the labeling results of the plurality of target annotators according to a preset inference algorithm to obtain the feedback result of each target annotator includes:

performing label calculation on the labeling results of the plurality of target labeling persons according to a weighted voting inference algorithm to obtain a feedback result of each target labeling person; or

Performing label calculation on the labeling results of the plurality of target labels according to a maximum likelihood inference algorithm to obtain a feedback result of each target label; or

And performing label calculation on the labeling results of the plurality of target annotators according to a set inference algorithm to obtain a feedback result of each target annotator.

Optionally, the tag calculation of the labeling results of the plurality of target annotators according to a weighted voting inference algorithm to obtain the feedback result of each target annotator includes:

identifying a plurality of reference questions in the target test question set of each target annotator;

matching the first labeling result of each reference question with the standard result of the corresponding reference question to obtain a first score of each target annotator;

calculating a second score of the plurality of control questions of each target annotator;

calculating a quotient between the first score and the second score, determining a confidence for each of the target annotators;

extracting a second test question set to be labeled from a plurality of target test question sets of the target labels;

identifying a second labeling result of each labeling test question in the second test question set to be labeled, wherein the second labeling result comprises one or more labeling results;

when each test question to be labeled contains a second labeling result, determining the number of the second labeling result as the number corresponding to the correct labeling answer of each test question to be labeled; or, when each test question to be labeled contains a plurality of second labeling results, calculating a number corresponding to a correct labeling answer of each test question to be labeled according to the plurality of second labeling results, and calculating by adopting the following formula:

，

wherein the content of the first and second substances,

the number corresponding to the correct labeling answer of each test question to be labeled is represented, j represents the total number of the target labels of each test question to be labeled,

the total number of the w second labeling results of the ith test question to be labeled is shown,

representing the confidence of each target annotator corresponding to each second annotation result;

determining a correct labeling answer corresponding to each test question to be labeled according to the number corresponding to the correct labeling answer of each test question to be labeled;

and analyzing a target test question set corresponding to each target annotator according to the correct annotation answer corresponding to each test question to be annotated to obtain a feedback result of each target annotator.

Optionally, the label calculation of the labeling results of the plurality of target annotators according to a maximum likelihood inference algorithm to obtain the feedback result of each target annotator includes:

，

wherein the content of the first and second substances,

to represent

The complement of (a) is to be added,

Optionally, the analyzing the received data annotation task request, and the obtaining the plurality of target annotators, the marking start instruction, and the marking end instruction includes:

analyzing the message of the data marking task request to obtain message information carried by the message;

acquiring a plurality of login information, the starting time for executing the labeling task and the ending time for executing the labeling task from the message information, wherein each login information comprises a login name of a label and a login equipment identification code;

determining a target annotator matched with the login name of each annotator from a preset annotator database, and associating each target annotator with a corresponding login equipment identification code;

and generating a starting annotation instruction according to the associated target annotators and the starting time of the execution annotation task, and generating an ending annotation instruction according to the associated target annotators and the ending time of the execution annotation task.

Optionally, the determining, according to the labeled images of the plurality of target labels, a target question set corresponding to each target label includes:

identifying login information of each target annotator, and acquiring an annotation picture of each target annotator based on the login information;

inputting the labeled portrait of the target labels into a group classification model trained in advance to obtain the group category of each target label;

extracting a plurality of key labels from the labeling image of each target annotator, and determining the target test set of each target annotator according to the key labels of each target annotator, the corresponding group categories and the first to-be-labeled test set in the data labeling task request, wherein the target test set comprises a plurality of reference questions and a second to-be-labeled test set.

Optionally, the determining, according to the plurality of key labels of each target annotator, the corresponding group category, and the first to-be-annotated test question set in the data annotation task request, the target test question set of each target annotator includes:

determining a second to-be-labeled test question set of each target label from a first to-be-labeled test question set in the data labeling task request according to a plurality of key labels and corresponding group categories of each target label and a preset screening rule;

calculating the similarity between the linguistic data to be labeled in the data labeling task request and each labeled linguistic data in the constructed labeled corpus;

extracting a plurality of first test questions corresponding to a plurality of labeling corpora with larger similarity from the calculated similarity, extracting a plurality of second test questions from the plurality of first test questions according to a plurality of key labels and corresponding group categories of each target annotator, and determining the plurality of second test questions as a plurality of comparison questions of each target annotator;

and combining the plurality of reference questions of each target annotator and the corresponding second test question set to be annotated to obtain the target test question set of each target annotator.

A second aspect of the present application provides a data processing apparatus, the apparatus comprising:

the analysis module is used for analyzing the received data annotation task request to acquire a plurality of target annotators, an annotation starting instruction and an annotation ending instruction;

the determining module is used for determining a target test question set corresponding to each target annotator according to the annotation images of the plurality of target annotators;

the first sending module is used for responding to the marking starting instruction, sending corresponding target test question sets to the target annotators at the same time, and responding to the marking ending instruction, and collecting marking results of the target annotators at the same time;

the label calculation module is used for performing label calculation on the labeling results of the plurality of target labels according to a preset inference algorithm to obtain a feedback result of each target label;

and the second sending module is used for sending the feedback result of each target annotator to the corresponding target annotator client.

A third aspect of the present application provides an electronic device comprising a processor and a memory, the processor being configured to implement the data processing method when executing a computer program stored in the memory.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method.

To sum up, according to the data processing method, the data processing device, the electronic device, and the storage medium, on one hand, in the process of determining the target test item set of each target annotator, the target annotators are subjected to group classification according to the annotation images of the target annotators, so that a set of target test item sets is generated for the target annotators of one group type, the number of the target test item sets is reduced, the generation efficiency of the target test item sets is improved, and the feedback efficiency of the annotation results is improved; on the other hand, when the correct labeling answer of the test question to be labeled is deduced, the contrast question is added into the test question set to be labeled, the confidence coefficient of each target label person is determined according to the labeling of each target label person on the contrast question, the correct labeling answer of each test question to be labeled is deduced according to the confidence coefficient of each target label person by adopting a preset deduction algorithm, the correct labeling answer of each test question to be labeled is not directly determined according to the labeling result of the test question to be labeled, but the correct labeling answer of each test question to be labeled is calculated by considering the correct rate of the contrast question labeled by each target label person, the accuracy of the obtained correct labeling answer of each test question to be labeled is ensured, and the accuracy and the efficiency of the feedback result are improved; and finally, sending the obtained feedback result to a target annotator client, wherein the feedback result comprises functions of associated knowledge point query, correct annotation answer, wrong annotation answer, interpretation of the wrong annotation answer, wrong answer redo and the like of each test question to be annotated, when the target annotator receives the feedback result, wrong question analysis and related knowledge of each test question to be annotated can be mastered according to the feedback result, and the professional literacy of each target annotator is further improved.

Drawings

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application.

Fig. 2 is a structural diagram of a data processing apparatus according to a second embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order that the above objects, features and advantages of the present application can be more clearly understood, a detailed description of the present application will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

Example one

In this embodiment, the data processing method may be applied to an electronic device, and for an electronic device that needs to perform data processing, the data processing function provided by the method of the present application may be directly integrated on the electronic device, or may be run in the electronic device in the form of a Software Development Kit (SDK).

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning, deep learning and the like.

As shown in fig. 1, the data processing method specifically includes the following steps, and the order of the steps in the flowchart may be changed and some steps may be omitted according to different requirements.

And S11, analyzing the received data annotation task request, and acquiring a plurality of target annotators, a start annotation instruction and an end annotation instruction.

In this embodiment, when the user is carrying out the labeling task, initiate data labeling task request to server through the client, specifically, the client can be smart mobile phone, IPAD or other current smart devices, the server can be crowdsourcing labeling system, in carrying out the labeling task process, if the client can send data labeling task request to crowdsourcing labeling system, crowdsourcing labeling system is used for the artifical real-time parallel mark and the verification to extensive data, works as crowdsourcing labeling system receives when the data labeling task request that the client sent, is right data labeling task request analyzes, acquires a plurality of target mark persons, begins the labeling instruction and finishes the labeling instruction.

In an optional embodiment, the analyzing the received data annotation task request to obtain a plurality of target annotators, a start annotation command, and an end annotation command includes:

In the embodiment, when the corpus is labeled in the crowd-sourced labeling system, one or more target annotators are needed for the same corpus to be labeled for labeling, when a plurality of target annotators are needed for labeling, the labeling information feedback can be carried out only after the labeling of the target annotators is completed, however, because the labeling time of each target annotator is not uniform, the labeling information feedback result can be obtained immediately after the labeling of each target annotator is difficult to complete, the scheme generates a start labeling instruction and an end labeling instruction according to the start time and the end time of a labeling task when a task labeling request is received, simultaneously sends the start labeling instruction and the end labeling instruction to the target annotators when a subsequent labeling task is executed, and each target annotator executes the labeling task in response to the start labeling instruction and the end labeling instruction sent by the crowd-sourced labeling system, the problem that the labeling time of a plurality of target labeling persons is not uniform, so that the labeling information feedback result cannot be obtained immediately is solved.

And S12, determining a target test question set corresponding to each target annotator according to the annotation images of the plurality of target annotators.

In this embodiment, when multiple target annotators are needed to label the same corpus, knowledge in multiple fields may exist in the same corpus, so that multiple target annotators may also be in different fields, and the label image of each target annotator is different, and the corresponding target test question sets are also different.

In an optional embodiment, the determining, according to the labeled images of the target annotators, a target test question set corresponding to each of the target annotators includes:

In this embodiment, the annotation portrait of each target annotator can be obtained according to the login name in the login information of each target annotator, specifically, the annotation portrait includes a plurality of key tags, in the subsequent determination process of the target test question set, the plurality of key tags of each target annotator need to be considered, and the plurality of key tags of each target annotator may include one or a combination of a plurality of manners as follows: marking grade, marking field range, historical marking level, label completion amount and the like.

In this embodiment, a plurality of target annotators can correspond to one set of target test sets, or to a plurality of sets of target test sets, in addition, the test questions to be labeled and the reference questions in the multiple sets of target test question sets can be repeated, the embodiment generates a set of target test question sets for multiple target markers of one group type by carrying out group classification on the multiple target markers according to the labeling images of the multiple target markers, on one hand, the number of the target test question sets is reduced, the generation efficiency of the target test question sets is improved, the feedback efficiency of the labeling results is improved, and simultaneously, because a plurality of target annotators simultaneously annotate the same set of target test question sets and the same test question to be annotated can appear in different target test question sets, the number of the labeling results of the same test question to be labeled is increased, and the accuracy of the correct labeling answer of each subsequently determined test question to be labeled is improved.

In this embodiment, a group classification model may be trained in advance, labels such as the labeling level, the labeling field range, the historical labeling level, and the label completion amount of the annotator are input into the group classification model trained in advance, and the group category of the annotator is identified according to the group classification model.

Optionally, the training process of the group classification model includes:

21) acquiring a plurality of annotators and corresponding group types;

22) extracting a plurality of preset key labels in the labeling portrait corresponding to each labeling person;

23) taking the plurality of key labels as a sample data set;

24) dividing a training set and a testing set from the sample data set;

25) inputting the training set into a preset neural network for training to obtain a group classification model;

26) inputting the test set into the group classification model for testing, and calculating the test passing rate;

27) when the test passing rate is greater than or equal to a preset passing rate threshold value, finishing the training of the group classification model; and when the test passing rate is smaller than the preset passing rate threshold, increasing the number of the training sets, and re-training the group classification model.

In this embodiment, a group type may be preset, and the group type is set according to a key tag of a annotator. The method comprises the steps of obtaining a plurality of standers of each type, extracting a plurality of preset key labels of a marker, then training a group classification model according to the plurality of preset key labels and group type identifications in a marked portrait corresponding to the marker, and then identifying the group type to which the marker belongs through the group classification model only by obtaining the plurality of key labels in the marked portrait corresponding to the marker.

Further, the determining the target test question set of each target annotator according to the plurality of key labels of each target annotator, the corresponding group categories and the corpora to be annotated in the data annotation task request includes:

In this embodiment, the first test question corresponding to each labeling corpus in the constructed labeling corpus includes a correct labeling answer, a target test question set is determined for each target label, a reference question is added to the target test question set of each target label, the confidence of each target label can be determined according to the accuracy of the reference question of each target label, and the accuracy of the correct labeling answer of each target label is improved by considering the accuracy of the reference question labeled by each target label when performing label inference of the test question to be labeled.

Specifically, the construction process of the annotation-containing corpus includes:

collecting a disclosed public corpus from a plurality of preset data sources;

sending an authorization request to a publisher corresponding to the non-public labeled corpus to be labeled, responding to the authorization instruction to collect the non-public labeled corpus to be labeled when detecting the authorization instruction of the publisher, and loading the non-public labeled corpus to be labeled into the public corpus to obtain the corpus containing labels.

In this embodiment, a plurality of data sources can be preset, the preset data sources can be a platform for storing data, a third-party application platform, a verification data platform and the like, a public corpus having an association relation with a labeled corpus in a labeling task can be collected through the preset data sources, specifically, in the process of constructing the corpus containing the label, two dimensions of the disclosed public corpus and an undisclosed labeled corpus to be labeled are considered, and the diversity and integrity of the constructed corpus containing the label are ensured.

And S13, responding to the marking starting instruction, sending corresponding target test question sets to the target annotators at the same time, and responding to the marking ending instruction, and collecting marking results of the target annotators at the same time.

In this embodiment, in order to ensure the feedback efficiency and accuracy of the labeling result, the corresponding target test item sets may be sent to the multiple target annotators at the same time for labeling, and the labeling results of the multiple target annotators may be collected at the same time in response to the instruction for ending the labeling.

And S14, performing label calculation on the labeling results of the plurality of target labels according to a preset inference algorithm to obtain a feedback result of each target label.

In this embodiment, the preset inference algorithm includes one or more of the following combinations: weighted voting inference algorithms, maximum likelihood inference algorithms, and set inference algorithms.

In an optional embodiment, the tag calculating the labeling results of the plurality of target annotators according to a preset inference algorithm to obtain the feedback result of each target annotator includes:

Further, the label calculation of the labeling results of the plurality of target annotators according to a weighted voting inference algorithm to obtain the feedback result of each target annotator includes:

when each test question to be labeled contains a second labeling result, determining the number of the second labeling result as the number corresponding to the correct labeling answer of each test question to be labeled (the number of the second labeling result is determined in this case, the number corresponding to the correct labeling answer of the test question to be labeled is determined in the next case, and the logic is not corresponding); or, when each test question to be labeled contains a plurality of second labeling results, calculating a number corresponding to a correct labeling answer of each test question to be labeled according to the plurality of second labeling results, and calculating by adopting the following formula:

，

wherein the content of the first and second substances,

Further, the label calculation of the labeling results of the plurality of target annotators according to a maximum likelihood inference algorithm to obtain the feedback result of each target annotator includes:

，

wherein the content of the first and second substances,

to represent

The complement of (a) is to be added,

In other optional embodiments, the performing label estimation on the labeling results of the plurality of target annotators according to a set inference algorithm to obtain the feedback result of each target annotator includes:

when each test question to be labeled contains a second labeling result, determining the number of the second labeling result as the number corresponding to the correct labeling answer of each test question to be labeled; or when each test question to be annotated comprises a plurality of second annotation results, calculating the sum of the confidence degrees of a plurality of target annotators corresponding to each second annotation result of each test question to be annotated to obtain the first confidence degree of each second annotation result of each test question to be annotated; calculating a product between the first product and the second product to obtain a second confidence coefficient of each second labeling result of each test question to be labeled, wherein the product of the confidence coefficients of a plurality of target labels corresponding to each second labeling result of each test question to be labeled is calculated to obtain a first product, the confidence coefficient of each target label is subtracted by 1 to obtain a target confidence coefficient of each target label, and the product of the target confidence coefficients of a plurality of target labels corresponding to each second labeling result of each test question to be labeled is calculated to obtain a second product;

calculating an average value between the first confidence coefficient of each second labeling result and the second confidence coefficient of the corresponding second labeling result to obtain a third confidence coefficient of each second labeling result of each test question to be labeled;

selecting a second labeling result number corresponding to the maximum third confidence coefficient from a plurality of third confidence coefficients of a plurality of second labeling results of each test question to be labeled, and determining a correct labeling answer corresponding to each test question to be labeled according to the second labeling result number corresponding to the maximum third confidence coefficient;

In this embodiment, the first score is used to represent a score corresponding to a correct labeling result of each target annotator, the second score is used to represent a total score of a plurality of references in the target test question set of each target annotator, and the confidence level is used to represent a correct probability of the references of each target annotator.

In this embodiment, the feedback result includes functions of querying the associated knowledge point of each test question to be labeled, correctly labeling the answer, incorrectly labeling the answer, reading the incorrectly labeled answer, and redoing the wrong question.

In this embodiment, when the correct labeling answer of the to-be-labeled test question is inferred, the reference questions are added into the to-be-labeled test question set, the confidence level of each target marker is determined according to the labeling of each target marker on the reference questions, the correct labeling answer of each to-be-labeled test question is inferred by adopting a preset inference algorithm according to the confidence level of each target marker, instead of directly determining the correct labeling answer of each to-be-labeled test question according to the labeling result of the to-be-labeled test question, the correct labeling answer of each to-be-labeled test question is inferred by considering the correct rate of the reference questions labeled by each target marker, the accuracy of the obtained correct labeling answer of each to-be-labeled test question is ensured, and the accuracy and the efficiency of the fed back result are improved.

And S15, sending the feedback result of each target annotator to the corresponding target annotator client.

In this embodiment, when the feedback result of each target annotator is obtained, the feedback result of each target annotator may be sent to the corresponding target annotator client according to a preset sending manner, and specifically, the preset sending manner may include one or a combination of the following manners: a short message mode, an email mode or a WeChat mode.

In an optional embodiment, the sending the feedback result of each target annotator to the corresponding target annotator client comprises:

and converting the feedback result of each target annotator into a feedback result in a preset format, and sending the feedback result in the preset format to the corresponding target annotator client according to a preset sending mode.

In this embodiment, the preset format may include one or a combination of the following modes: picture format, PDF format, EXCEL format, editable format, non-editable format, encrypted format, and unencrypted format.

Illustratively, if the confidentiality level of the task to be labeled is higher, when the feedback result of each target label is obtained, the feedback result can be converted into the feedback result in the encrypted format, so that the safety of the feedback result is improved.

In the embodiment, the feedback result can be converted into various formats, so that the diversity and the flexibility of the feedback result are improved.

In this embodiment, since the feedback result includes functions of querying the associated knowledge point of each test question to be labeled, correctly labeling the answer, incorrectly labeling the answer, reading the incorrectly labeled answer, and redoing the wrong question, when the target label maker receives the feedback result, the target label maker can master the wrong question analysis and related knowledge of each test question to be labeled according to the feedback result, thereby improving the professional literacy of each target label maker.

In summary, in the data processing method according to this embodiment, in the process of determining the target test question set of each target annotator, the target annotators are subjected to group classification according to the annotation images of the target annotators, so as to generate a set of target test question sets for the target annotators of one group type, on one hand, the number of the target test question sets is reduced, the generation efficiency of the target test question sets is improved, and further, the feedback efficiency of the annotation results is improved; when the correct labeling answer of the test questions to be labeled is deduced, the reference questions are added into the test question set to be labeled, the confidence coefficient of each target label person is determined according to the labeling of each target label person on the reference questions, the correct labeling answer of each test question to be labeled is deduced according to the confidence coefficient of each target label person by adopting a preset deduction algorithm, the correct labeling answer of each test question to be labeled is not directly determined according to the labeling result of the test question to be labeled, but the correct labeling answer of each test question to be labeled is calculated by considering the correct rate of the reference questions labeled by each target label person, the accuracy of the obtained correct labeling answer of each test question to be labeled is ensured, and the accuracy and the efficiency of the result fed back are improved. And sending the obtained feedback result to the client of the corresponding target marker, wherein the feedback result comprises functions of inquiring the associated knowledge point of each test question to be marked, correctly marking the answer, wrongly marking the answer, reading the wrongly marked answer, redoing the wrong question and the like, when the target marker receives the feedback result, the wrong question analysis and related knowledge of each test question to be marked can be mastered according to the feedback result, and the professional literacy of each target marker is further improved.

Example two

In some embodiments, the data processing apparatus 20 may comprise a plurality of functional modules comprised of program code segments. The program code of the various program segments in the data processing device 20 may be stored in a memory of the electronic device and executed by the at least one processor to perform the functions of data processing (described in detail in fig. 1).

In this embodiment, the data processing apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the data processing apparatus. The functional module may include: the system comprises an analysis module 201, a determination module 202, a first sending module 203, a label calculation module 204 and a second sending module 205. A module as referred to herein is a series of computer readable instruction segments stored in a memory capable of being executed by at least one processor and capable of performing a fixed function. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.

The parsing module 201 is configured to parse the received data annotation task request to obtain a plurality of target annotators, a start annotation instruction, and an end annotation instruction.

In an optional embodiment, the parsing module 201 parses the received data annotation task request, and acquiring a plurality of target annotators, a start annotation instruction, and an end annotation instruction includes:

The determining module 202 is configured to determine, according to the labeled images of the multiple target markers, a target test question set corresponding to each of the target markers.

In an optional embodiment, the determining module 202 determines, according to the annotation images of the target annotators, a target test question set corresponding to each of the target annotators, including:

Optionally, the training process of the group classification model includes:

21) acquiring a plurality of annotators and corresponding group types;

23) taking the plurality of key labels as a sample data set;

24) dividing a training set and a testing set from the sample data set;

collecting a disclosed public corpus from a plurality of preset data sources;

The first sending module 203 is configured to send corresponding target test sets to the multiple target annotators simultaneously in response to the annotation starting instruction, and collect annotation results of the multiple target annotators simultaneously in response to the annotation ending instruction.

And the label calculation module 204 is configured to perform label calculation on the labeling results of the multiple target annotators according to a preset inference algorithm to obtain a feedback result of each target annotator.

In an optional embodiment, the label calculation module 204 performs label calculation on the labeling results of the plurality of target annotators according to a preset inference algorithm, and obtaining the feedback result of each target annotator includes:

，

wherein the content of the first and second substances,

，

wherein the content of the first and second substances,

to represent

The complement of (a) is to be added,

A second sending module 205, configured to send the feedback result of each target annotator to the corresponding target annotator client.

In an optional embodiment, the sending, by the second sending module 205, the feedback result of each target annotator to the corresponding target annotator client includes:

In summary, in the data processing apparatus according to this embodiment, in the process of determining the target test question set of each target annotator, the target annotators are group-classified according to the annotation images of the target annotators, so as to generate a set of target test question sets for the target annotators of a group type, on one hand, the number of the target test question sets is reduced, the generation efficiency of the target test question sets is improved, and further, the feedback efficiency of the annotation results is improved; when the correct labeling answer of the test questions to be labeled is deduced, the reference questions are added into the test question set to be labeled, the confidence coefficient of each target label person is determined according to the labeling of each target label person on the reference questions, the correct labeling answer of each test question to be labeled is deduced according to the confidence coefficient of each target label person by adopting a preset deduction algorithm, the correct labeling answer of each test question to be labeled is not directly determined according to the labeling result of the test question to be labeled, but the correct labeling answer of each test question to be labeled is calculated by considering the correct rate of the reference questions labeled by each target label person, the accuracy of the obtained correct labeling answer of each test question to be labeled is ensured, and the accuracy and the efficiency of the result fed back are improved. And sending the obtained feedback result to the client of the corresponding target marker, wherein the feedback result comprises functions of inquiring the associated knowledge point of each test question to be marked, correctly marking the answer, wrongly marking the answer, reading the wrongly marked answer, redoing the wrong question and the like, when the target marker receives the feedback result, the wrong question analysis and related knowledge of each test question to be marked can be mastered according to the feedback result, and the professional literacy of each target marker is further improved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present application. In the preferred embodiment of the present application, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.

It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 does not constitute a limitation of the embodiments of the present application, and may be a bus-type configuration or a star-type configuration, and that the electronic device 3 may include more or less hardware or software than those shown, or a different arrangement of components.

In some embodiments, the electronic device 3 is an electronic device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may also include a client device, which includes, but is not limited to, any electronic product that can interact with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, and the like.

It should be noted that the electronic device 3 is only an example, and other existing or future electronic products, such as those that can be adapted to the present application, should also be included in the scope of protection of the present application, and are included by reference.

In some embodiments, the memory 31 is used for storing program codes and various data, such as the data processing device 20 installed in the electronic equipment 3, and realizes high-speed and automatic access to programs or data during the operation of the electronic equipment 3. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.

In some embodiments, the at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects various components of the electronic device 3 by using various interfaces and lines, and executes various functions and processes data of the electronic device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.

In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.

Although not shown, the electronic device 3 may further include a power supply (such as a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, an electronic device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present application.

In a further embodiment, in conjunction with fig. 2, the at least one processor 32 may execute an operating device of the electronic device 3 and various installed application programs (such as the data processing device 20), program codes, and the like, for example, the above modules.

The memory 31 has program code stored therein, and the at least one processor 32 can call the program code stored in the memory 31 to perform related functions. For example, the modules illustrated in fig. 2 are program codes stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the modules for the purpose of data processing.

Illustratively, the program code may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 32 to accomplish the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing certain functions, which are used for describing the execution process of the program code in the electronic device 3. For example, the program code may be divided into a parsing module 201, a determining module 202, a first sending module 203, a tag calculating module 204, and a second sending module 205.

In one embodiment of the present application, the memory 31 stores a plurality of computer readable instructions that are executed by the at least one processor 32 to implement the functionality of data processing.

Specifically, the at least one processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, and details are not repeated here.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the present application may also be implemented by one unit or means through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims

1. A method of data processing, the method comprising:

performing label calculation on the labeling results of the plurality of target annotators according to a preset inference algorithm to obtain a feedback result of each target annotator, wherein the label calculation comprises the following steps:

wherein i^*A number corresponding to a correct labeling answer of each test question to be labeled is represented, j represents the total number of target labels of each test question to be labeled, S_iwC represents the total number of the w second labeling results of the ith test question to be labeled, c_jRepresenting the confidence of each target annotator corresponding to each second annotation result;

analyzing a target test question set corresponding to each target annotator according to a correct annotation answer corresponding to each test question to be annotated to obtain a feedback result of each target annotator;

2. The data processing method of claim 1, wherein the tag estimation of the labeling results of the target annotators according to a preset inference algorithm to obtain the feedback result of each target annotator further comprises:

3. The data processing method of claim 2, wherein the label estimation of the labeling results of the plurality of target labels according to the maximum likelihood inference algorithm to obtain the feedback result of each target label comprises:

wherein i^*A number corresponding to a correct labeling answer of each test question to be labeled is represented, j represents the total number of target labels of each test question to be labeled, S_iwThe total number of the w second labeling results of the ith test question to be labeled is shown,

denotes S_iwComplement of, c_jRepresenting the confidence of each target annotator corresponding to each second annotation result;

4. The data processing method of claim 1, wherein the parsing the received data annotation task request to obtain a plurality of target annotators, a start annotation instruction, and an end annotation instruction comprises:

5. The data processing method of claim 1, wherein the determining the target test set corresponding to each of the target annotators according to the annotation images of the plurality of target annotators comprises:

6. The data processing method of claim 5, wherein the determining the target test question set of each target annotator according to the plurality of key labels of each target annotator, the corresponding group category and the first to-be-annotated test question set in the data annotation task request comprises:

7. A data processing apparatus, characterized in that the apparatus comprises:

the label calculation module is used for performing label calculation on the labeling results of the plurality of target annotators according to a preset inference algorithm to obtain a feedback result of each target annotator, and comprises:

8. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to implement the data processing method of any one of claims 1 to 6 when executing the computer program stored in the memory.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 6.