CN110288007B

CN110288007B - Data labeling method and device and electronic equipment

Info

Publication number: CN110288007B
Application number: CN201910487643.6A
Authority: CN
Inventors: 刘宇达
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2021-02-02
Anticipated expiration: 2039-06-05
Also published as: CN110288007A; WO2020244183A1

Abstract

The application provides a method, a device and an electronic device for data annotation, wherein a specific implementation manner of the method comprises the following steps: labeling a plurality of data to be labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled; selecting the non-credible data in the data to be labeled by using a pre-trained target classifier so as to verify the labeling result corresponding to the non-credible data; and correcting the labeling result corresponding to the non-credible data which is not verified in the target set. The implementation method can ensure that the labeling work does not need to depend on people completely, saves a large amount of human resources and improves the labeling efficiency. Meanwhile, the result of the marked data can be verified more pertinently, and the marking accuracy is improved.

Description

Data labeling method and device and electronic equipment

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a method and an apparatus for data annotation, and an electronic device.

Background

With the continuous development of artificial intelligence technology, artificial intelligence technology has been widely used in various fields. Artificial intelligence techniques typically involve machine learning, in which a large amount of training sample data is required and labeled. Currently, a large amount of training sample data is generally labeled manually, so that the labeling work is excessively dependent on people and the workload is huge. Thereby consuming a large amount of human resources and having low labeling efficiency.

Disclosure of Invention

In order to solve one of the above technical problems, the present application provides a method, an apparatus and an electronic device for data annotation.

According to a first aspect of embodiments of the present application, there is provided a method for data annotation, including:

labeling a plurality of data to be labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled;

selecting the non-credible data in the data to be labeled by using a pre-trained target classifier so as to verify the labeling result corresponding to the non-credible data;

and correcting the labeling result corresponding to the non-credible data which is not verified in the target set.

Optionally, the method further includes:

determining the non-authentic data which is not verified as a first positive sample, and determining the non-authentic data which is verified as a first negative sample;

updating the target classifier with the first positive examples and the first negative examples.

Optionally, the target labeling model is trained in the following manner:

iteratively executing the updating operation of the annotation model until a stopping condition is met, and taking the iteratively updated annotation model as the target annotation model; wherein the update operation comprises:

selecting an untrusted sample from the sample data by using a current target classifier;

acquiring a manual marking result corresponding to the non-credible sample as a first sample result;

and updating the current labeling model by using the first sample result.

Optionally, the updating operation further includes:

selecting a credible sample in the sample data by using a current target classifier;

labeling the credible sample by using a current labeling model to obtain a second sample result;

determining a verification result for verifying the second sample result;

determining a second sample result which is not verified as a second negative sample and determining a second sample result which is verified as a second positive sample based on the verification result;

updating the current target classifier with the second positive examples and the second negative examples.

Optionally, the target annotation model includes a plurality of unit models with different structures; the data to be marked is image data to be marked;

any image data to be labeled is labeled by the following method:

labeling the image data by adopting each unit model respectively;

and determining one or more labeling targets in the image data by using a non-maximum suppression NMS algorithm based on the result of labeling in the image data by each unit model, and determining a label corresponding to each labeling target.

Optionally, after correcting the annotation result corresponding to the non-trusted data that fails to pass the verification in the target set, the method further includes:

storing the corrected target set into a pre-established marking database;

and updating the target labeling model by using the labeling database.

According to a second aspect of the embodiments of the present application, there is provided an apparatus for data annotation, including:

the labeling module is used for labeling a plurality of data to be labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled;

the selecting module is used for selecting the unreliable data in the data to be labeled by utilizing a pre-trained target classifier so as to verify the labeling result corresponding to the unreliable data;

and the correcting module is used for correcting the labeling result corresponding to the non-credible data which is not verified in the target set.

Optionally, the method further includes:

the determining module is used for determining the non-authentic data which is not verified as a first positive sample and determining the non-authentic data which is verified as a first negative sample;

a first updating module for updating the target classifier using the first positive sample and the first negative sample.

According to a third aspect of embodiments herein, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of the above first aspects.

According to a fourth aspect of embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the first aspect when executing the program.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the data labeling method and device provided by the embodiment of the application, a plurality of data to be labeled are labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled, and non-credible data in the data to be labeled are selected by utilizing a pre-trained target classifier to verify the labeling results corresponding to the non-credible data and correct the labeling results corresponding to the non-credible data which are not verified in the target set. In the embodiment, after the pre-trained target labeling model is used for labeling the data to be labeled, the target classifier is used for screening the data to be labeled which are more likely to be labeled wrongly for sampling inspection, and correcting the wrong labeling result. The marking work does not need to depend on people completely, a large amount of human resources are saved, and the marking efficiency is improved. Meanwhile, the result of the marked data can be verified more pertinently, and the marking accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart illustrating a method of data annotation according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating another method of data annotation shown herein in accordance with an exemplary embodiment;

FIG. 3 is a flow chart illustrating another method of data annotation shown herein in accordance with an exemplary embodiment;

FIG. 4 is a block diagram of an apparatus for data annotation shown herein in accordance with an exemplary embodiment;

FIG. 5 is a block diagram illustrating another apparatus for data annotation according to an exemplary embodiment of the present application;

FIG. 6 is a block diagram illustrating another apparatus for data annotation according to an exemplary embodiment;

fig. 7 is a schematic structural diagram of an electronic device shown in the present application according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As shown in fig. 1, fig. 1 is a flowchart illustrating a method for data annotation according to an exemplary embodiment, where the method may be applied to a terminal device and may also be applied to a server. The method comprises the following steps:

in step 101, a plurality of data to be labeled are labeled through a pre-trained target labeling model, so as to obtain a target set formed by labeling results corresponding to the data to be labeled.

In this embodiment, a plurality of data to be labeled can be labeled through a pre-trained target labeling model to obtain a target set, where the target set is formed by labeling results corresponding to the data to be labeled. The data to be annotated can be image type data, sound type data, text type data, and the like, and it can be understood that the data to be annotated can also be any other types of data, and the specific type aspect of the data to be annotated is not limited in the present application.

In this embodiment, labeling the data to be labeled may be to label a labeling target in the data to be labeled, and set a corresponding label (e.g., a category label, an attribute label, an ID label, etc.) for the label. Taking the data to be labeled of the image type as an example, labeling the data of the image type may be to label a target object (i.e., a labeling target) in the data of the image type with a labeling frame, and set a label (e.g., an object type, an object attribute, an object ID, etc.) of the target object.

In this embodiment, the target annotation model may be one model or a plurality of models with different structures, and optionally, if the data to be annotated is image-type data, the target annotation model may be a neural network-type model. When the target labeling model is initially trained, the collected training sample data can be labeled in a manual mode, and then the target labeling model is obtained through training by using the manually labeled training sample data. In the subsequent use process, the target labeling model can be continuously optimized and updated, so that the labeling accuracy of the target labeling model is higher.

In this embodiment, if the target annotation model includes a plurality of unit models with different structures, and the data to be annotated is the image data to be annotated, any image data to be annotated can be annotated by the following method: first, each unit model can be individually labeled in the image data. Then, based on the result of labeling in the image data by each unit model, one or more labeling targets in the image data are determined by using a Non Maximum Suppression (NMS) algorithm, and a label corresponding to each labeling target is determined, thereby completing labeling of the image data. Wherein, the labeling target is an object needing to be labeled in the image data. Because the unit models with different structures are adopted for marking, the marking accuracy can be improved.

In step 102, a pre-trained target classifier is used to select the non-credible data in the data to be labeled so as to verify the labeling result corresponding to the non-credible data.

In this embodiment, a pre-trained target classifier may be used to select the unreliable data from the data to be labeled, so as to verify the labeling result corresponding to the unreliable data. The non-credible data can be data which has strong confusion, is not obvious in characteristics, is difficult to distinguish, is difficult to label and is easy to label errors. Taking the data of the image type as an example, the untrusted data may be image data in which an image is blurred, or an object having a large interference in the image, or a target object in the image is not obvious, or the like. For example, the non-trusted data may be sound data with large environmental noise or small target sound. Taking text-type data as an example, the untrusted data may be text data with fuzzy semantic features, or the like.

In this embodiment, the target classifier may be a pre-trained classifier, and any classifier known in the art and possibly appearing in the future that can be applied thereto may be applied to the present application, and the present application is not limited to a specific type of the target classifier. When the target classifier is initially trained, training sample data can be divided into credible sample data and non-credible sample data in a manual classification screening mode. And then training by using training sample data obtained by manual classification screening to obtain the target classifier. In the subsequent use process, the target classifier can be continuously optimized and updated, so that the classification accuracy of the target classifier is higher.

In this embodiment, the target classifier is used to select the unreliable data that is easily labeled with errors from the data to be labeled, and then the labeling result corresponding to the unreliable data is verified, so as to determine whether the labeling result corresponding to the unreliable data is correct. The verification can be performed manually or in any other reasonable manner, and it is understood that the specific verification manner is not limited in the present application. For any piece of non-trusted data, if the labeling result corresponding to the non-trusted data is determined to be wrong, the non-trusted data is not verified, and if the labeling result corresponding to the non-trusted data is determined to be correct, the non-trusted data is verified.

In step 103, the labeling result corresponding to the non-trusted data that fails to pass the verification in the target set is corrected.

In this embodiment, for the selected non-trusted data, if the non-trusted data passes the verification, no operation is performed. If the non-credible data is not verified, the non-credible data needs to be correctly labeled again, so that a correct labeling result is obtained. And replacing the wrong labeling result corresponding to the non-credible data in the target set by using the correct labeling result corresponding to the non-credible data, thereby correcting the labeling result corresponding to the non-credible data which is not verified in the target set.

In this embodiment, a label database may be pre-established, where the label database is used to store labeled data, and the machine learning training is performed by using the data stored in the label database. The corrected target set can be stored in the label database as labeled data.

In the data labeling method provided by the embodiment of the application, a plurality of data to be labeled are labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled, and non-credible data in the data to be labeled is selected by using a pre-trained target classifier to verify the labeling results corresponding to the non-credible data and correct the labeling results corresponding to the non-credible data which does not pass the verification in the target set. In the embodiment, after the pre-trained target labeling model is used for labeling the data to be labeled, the target classifier is used for screening the data to be labeled which are more likely to be labeled wrongly for sampling inspection, and correcting the wrong labeling result. The marking work does not need to depend on people completely, a large amount of human resources are saved, and the marking efficiency is improved. Meanwhile, the result of the marked data can be verified more pertinently, and the marking accuracy is improved.

Fig. 2 is a flowchart illustrating another method for labeling data according to an exemplary embodiment, which describes a process of updating a target classifier and may be applied to a terminal device or a server, as shown in fig. 2. The method comprises the following steps:

in step 201, a plurality of data to be labeled are labeled through a pre-trained target labeling model, so as to obtain a target set formed by labeling results corresponding to the data to be labeled.

In step 202, the pre-trained target classifier is used to select the non-credible data in the data to be labeled, so as to verify the labeling result corresponding to the non-credible data.

In step 203, the labeling result corresponding to the non-trusted data that fails to pass the verification in the target set is corrected.

In step 204, the non-validated untrusted data is determined to be a first positive sample, and the validated untrusted data is determined to be a first negative sample.

In the embodiment, since the non-credible data has strong confusion, has insignificant characteristics, is difficult to distinguish, is difficult to label and is easy to label wrongly. Therefore, if the non-credible data screened by the target classifier is labeled with an error, the classification of the non-credible data by the target classifier is more accurate. And if the non-credible data screened by the target classifier is marked correctly, the classification of the non-credible data by the target classifier is not accurate enough. Therefore, the non-authentic data that failed verification (i.e., the non-authentic data marked with errors) is taken as a first positive sample, and the non-authentic data that passed verification (i.e., the non-authentic data marked with correct errors) is determined as a first negative sample.

In step 205, the target classifier is updated with the first positive examples and the first negative examples.

In this embodiment, the target classifier may be retrained by using the first positive sample and the first negative sample to perform optimization updating on the target classifier, so that the target classifier can screen more appropriate untrusted data.

It should be noted that, for the same steps as in the embodiment of fig. 1, details are not repeated in the embodiment of fig. 2, and related contents may refer to the embodiment of fig. 1.

In the data labeling method provided by the embodiment of the application, a plurality of data to be labeled are labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled, and a pre-trained target classifier is used for selecting the non-credible data in the data to be labeled to verify the labeling results corresponding to the non-credible data and correct the labeling results corresponding to the non-credible data which do not pass the verification in the target set. And determining the un-verified untrusted data as a first positive sample, determining the verified untrusted data as a first negative sample, and updating the target classifier by using the first positive sample and the first negative sample. Not only saves a large amount of human resources and improves the efficiency and accuracy of labeling, but also ensures that the non-credible data screened by the target classifier is more targeted because the target classifier is continuously optimized and updated in the labeling process.

Fig. 3 is a flowchart illustrating another method for annotating data according to an exemplary embodiment, which describes in detail the process of updating a target annotation model and is applicable to a terminal device and a server, as shown in fig. 3. The method comprises the following steps:

in step 301, a plurality of data to be labeled are labeled through a pre-trained target labeling model, so as to obtain a target set formed by labeling results corresponding to the data to be labeled.

In step 302, a pre-trained target classifier is used to select the non-credible data in the data to be labeled, so as to verify the labeling result corresponding to the non-credible data.

In step 303, the labeled result corresponding to the non-trusted data that fails to pass the verification in the target set is corrected.

In step 304, the corrected target set is stored in a pre-established annotation database.

In step 305, the target annotation model is updated with the annotation database.

In this embodiment, the data in the annotation database may be used as a training sample to retrain the target annotation model, so as to perform optimization updating on the target annotation model, thereby making the result of the target annotation model for annotating the data to be annotated more accurate.

It should be noted that, for the same steps as in the embodiment of fig. 1 and fig. 2, details are not repeated in the embodiment of fig. 3, and related contents may refer to the embodiment of fig. 1 and fig. 2.

In the data labeling method provided by the above embodiment of the present application, a plurality of data to be labeled are labeled through a pre-trained target labeling model to obtain a target set formed by labeling results corresponding to the data to be labeled, non-trusted data in the data to be labeled is selected by using a pre-trained target classifier to verify the labeling result corresponding to the non-trusted data, the labeling result corresponding to the non-trusted data which does not pass the verification in the target set is corrected, the corrected target set is stored in a pre-established labeling database, and the labeling database is used to update the target labeling model. Not only saves a large amount of human resources and improves the efficiency of labeling, but also optimizes and updates the target labeling model by using the labeling database after the labeling is finished each time, so that the target labeling model is continuously perfected and the accuracy of labeling is further improved.

In some alternative embodiments, the target annotation model may be trained by: and (4) iteratively executing the updating operation of the annotation model until a stopping condition is met, and taking the iteratively updated annotation model as a target annotation model.

In this embodiment, when the annotation model is initially trained, the acquired training sample data may be labeled manually, and the annotation model is trained by fully using the manually labeled training sample data. When the labeling result of the trained labeling model on the data to be labeled reaches a certain accuracy, the method of the embodiment can be adopted to continue to train the labeling model. Specifically, the updating operation on the annotation model may be iteratively performed until a stop condition is satisfied (for example, the objective function convergence is satisfied, or the iteration number exceeds a preset number, and the like), and the annotation model after being iteratively updated is used as the target annotation model. Wherein, the update operation may include the following steps:

step a: and selecting an untrusted sample from the sample data by using the current target classifier.

In this embodiment, firstly, an untrusted sample in sample data may be selected by using a currently trained target classifier, where the untrusted sample may be sample data that is relatively confusing, has insignificant characteristics, is difficult to distinguish, is difficult to label, and is easy to be labeled incorrectly.

Step b: and acquiring a manual marking result corresponding to the non-credible sample as a first sample result.

Step c: and updating the current annotation model by using the first sample result.

In this embodiment, since the untrusted sample has strong confusability and insignificant features, it is more likely that an error occurs when the untrusted sample is labeled by using the labeling model. Therefore, the non-trusted sample can be labeled manually, so that a manual labeling result corresponding to the non-trusted sample is obtained and serves as the first sample result. The first sample result has higher accuracy, and the current annotation model can be continuously updated by using the first sample result, so that the annotation model is continuously perfected until a stopping condition is met, and the target annotation model is obtained. Therefore, the accuracy of the target labeling model is improved.

In further alternative embodiments, the update operation may further include the following steps:

step d: and selecting a credible sample in the sample data by using the current target classifier.

In this embodiment, a current target classifier may be further used to select a trusted sample from the sample data, which is the opposite of the non-trusted sample. The credible sample can be sample data which has obvious characteristics, is easy to distinguish, is easy to label and is not easy to label errors.

Step e: and marking the credible sample by using the current marking model to obtain a second sample result.

Step f: a verification result is determined that verifies against the second sample result.

In this embodiment, because the characteristics of the credible sample are significant and easy to distinguish, errors are not easy to occur when the credible sample is labeled by using the labeling model. The current labeling model can be used for labeling the credible sample to obtain a second sample result. Verification can be performed on the second sample result to determine whether the result of labeling the credible sample by the current labeling model is correct.

Step g: based on the verification result, the second sample result that has not passed the verification is determined as a second negative sample, and the second sample result that has passed the verification is determined as a second positive sample.

In this embodiment, if the credible data screened by the current target classifier is labeled by the labeling model incorrectly, it indicates that the classification of the credible data by the target classifier is not accurate enough. If the credible data screened by the current target classifier is correctly labeled by the labeling model, the classification of the credible data by the current target classifier is more accurate. Therefore, the result of the second sample that fails to verify (i.e., the trusted data that is marked as erroneous) may be determined as a second negative sample, and the result of the second sample that passes verification (i.e., the trusted data that is marked as correct) may be determined as a second positive sample.

Step h: and updating the current target classifier by using the second positive sample and the second negative sample.

In this embodiment, the current target classifier can be optimized and updated by using the second positive sample and the second negative sample, so that the target classifier can more accurately screen the trusted data and the untrusted data.

It should be noted that although in the above embodiments, the operations of the methods of the present application were described in a particular order, this does not require or imply that these operations must be performed in that particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Corresponding to the foregoing method embodiments for data annotation, the present application also provides embodiments of a device for data annotation.

As shown in fig. 4, fig. 4 is a block diagram of an apparatus for data annotation according to an exemplary embodiment of the present application, where the apparatus may include: a marking module 401, a selecting module 402 and a correcting module 403.

The labeling module 401 is configured to label multiple data to be labeled through a pre-trained target labeling model, and obtain a target set formed by labeling results corresponding to the data to be labeled.

A selecting module 402, configured to select, by using a pre-trained target classifier, untrusted data in the data to be labeled, so as to verify a labeling result corresponding to the untrusted data.

The correcting module 403 is configured to correct a labeling result corresponding to the non-trusted data that fails to pass the verification in the target set.

As shown in fig. 5, fig. 5 is a block diagram of another data annotation apparatus shown in this application according to an exemplary embodiment, which may further include, on the basis of the foregoing embodiment shown in fig. 4: a determination module 404 and a first update module 405.

The determining module 404 is configured to determine the non-authentic data that has not passed the verification as a first positive sample, and determine the non-authentic data that has passed the verification as a first negative sample.

A first updating module 405 for updating the target classifier with the first positive examples and the first negative examples.

In some alternative embodiments, the target labeling model may be trained by: and iteratively executing the updating operation of the annotation model until a stopping condition is met, and taking the iteratively updated annotation model as the target annotation model. Wherein, the update operation may include: and selecting an untrusted sample in the sample data by using the current target classifier, obtaining an artificial labeling result corresponding to the untrusted sample as a first sample result, and updating the current labeling model by using the first sample result.

In still other optional embodiments, the update operation may further include: selecting a credible sample from the sample data by using the current target classifier, labeling the credible sample by using the current labeling model to obtain a second sample result, determining a verification result for verifying the second sample result, determining the second sample result which does not pass the verification as a second negative sample based on the verification result, determining the second sample result which passes the verification as a second positive sample, and updating the current target classifier by using the second positive sample and the second negative sample.

In other alternative embodiments, the target annotation model may include a plurality of unit models with different structures, and the data to be annotated is image data to be annotated.

Any image data to be annotated can be annotated as follows: and respectively labeling the image data by adopting each unit model, determining one or more labeling targets in the image data by using a non-maximum suppression (NMS) algorithm based on the result of labeling the image data by each unit model, and determining a label corresponding to each labeling target.

As shown in fig. 6, fig. 6 is a block diagram of another data annotation apparatus shown in this application according to an exemplary embodiment, which may further include, on the basis of the foregoing embodiment shown in fig. 4: a deposit module 406 and a second update module 407.

The storing module 406 is configured to store the corrected target set into a pre-established label database.

And a second updating module 407, configured to update the target annotation model with the annotation database.

It should be understood that the above-mentioned apparatus may be preset in the terminal device or the server, and may also be loaded into the terminal device or the server by downloading or the like. The corresponding modules in the above device can cooperate with the modules in the terminal equipment or the server to implement the data annotation scheme.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

An embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program can be used to execute the method for data annotation provided in any one of the embodiments of fig. 1 to fig. 3.

Corresponding to the above-mentioned data labeling method, an embodiment of the present application further provides a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application, shown in fig. 7. Referring to fig. 7, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the data labeling device on the logic level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of data annotation, the method comprising:

correcting a labeling result corresponding to the non-credible data which is not verified in the target set, storing the corrected target set into a pre-established labeling database, and updating the target labeling model by using the labeling database;

the data to be labeled is used as training sample data of machine learning, and the labeling result is used as labeling of the training sample data of machine learning;

the method further comprises the following steps:

2. The method of claim 1, wherein the target labeling model is trained by:

and updating the current labeling model by using the first sample result.

3. The method of claim 2, wherein the update operation further comprises:

determining a verification result for verifying the second sample result;

4. The method of claim 1, wherein the target annotation model comprises a plurality of structurally distinct unit models; the data to be marked is image data to be marked;

any image data to be labeled is labeled by the following method:

labeling the image data by adopting each unit model respectively;

5. An apparatus for annotating data, the apparatus comprising:

the correcting module is used for correcting the labeling result corresponding to the non-credible data which is not verified in the target set;

the storage module is used for storing the corrected target set into a pre-established labeling database;

the second updating module is used for updating the target labeling model by utilizing the labeling database;

the device further comprises:

6. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, carries out the method of any of the preceding claims 1-4.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-4 when executing the program.