WO2019095899A1

WO2019095899A1 - Material annotation method and apparatus, terminal, and computer readable storage medium

Info

Publication number: WO2019095899A1
Application number: PCT/CN2018/109774
Authority: WO
Inventors: 陆艳; 刘勇; 高洪
Original assignee: 中兴通讯股份有限公司
Priority date: 2017-11-17
Filing date: 2018-10-11
Publication date: 2019-05-23
Also published as: CN109800776A

Abstract

Disclosed are a material annotation method and apparatus, a terminal, and a computer readable storage medium. The method comprises: annotating a material in a set of materials to be annotated, according to a preset algorithm model; generating a training set corresponding to the annotation result, based on the annotation result; and updating the algorithm model by means of the training set, for use in a next material annotation.

Description

Material labeling method and device, terminal and computer readable storage medium

The present application claims priority to Chinese Patent Application No. PCT Application No.

Technical field

The present disclosure relates to the field of wireless communication technologies, for example, to a material annotation method and apparatus, a terminal, and a computer readable storage medium.

Background technique

Today, with the rapid development of artificial intelligence, the labeling and proofreading of various materials has always required a lot of time and manpower. The labeling and proofreading of the material needs to be obtained by analyzing a large amount of training materials. These training materials are marked in advance according to a certain logic, which is usually manually labeled, and the labeling process requires a lot of manpower and time. The process of labeling is actually the process of interpreting the features in the material. Different people may have different interpretation results, so the material labeling is very subjective. The knowledge structure and grammar theory of different labelers are also different, which makes the results of labeling different and difficult to unify.

Summary of the invention

The embodiment of the present application provides a material labeling method and device, a terminal, and a computer readable storage medium, and aims to solve the problem that the material labeling is time-consuming and labor-intensive in the related art, and the labeling result is difficult to be unified.

The embodiment of the present application provides a material labeling method, and the material labeling method includes: labeling materials in the labeling material set according to a preset algorithm model; and generating a training set corresponding to the labeling result based on the labeling result; The training set updates the preset algorithm model for the next material annotation.

The embodiment of the present application further provides a material labeling device, including: a material labeling module, a training generating module, and an algorithm training module.

The material labeling module is set to label the materials in the annotation material set according to the preset algorithm model.

The training generation module is configured to generate a training set corresponding to the labeled result based on the result of the annotation.

An algorithm training module is configured to update the preset algorithm model through the training set for the next material annotation.

The embodiment of the present application further provides a terminal, including a processor, a memory, and a communication bus; the communication bus is configured to implement connection communication between the processor and the memory; and the processor is configured to perform storage in the memory The material labeling program to implement the aforementioned material labeling method.

The embodiment of the present application further provides a computer readable storage medium storing at least one computer program executable by at least one processor to implement the foregoing material labeling method.

BRIEF abstract

1 is a flow chart of a material labeling method according to a first embodiment of the present application;

2 is a schematic diagram of a material labeling according to a first embodiment of the present application;

3 is a detailed flowchart of a material labeling method according to a second embodiment of the present application;

4 is a schematic diagram of material labeling according to a third embodiment of the present application;

FIG. 5 is a schematic diagram of a composition of a material marking device according to a fourth embodiment of the present application; FIG.

FIG. 6 is a schematic structural diagram of a terminal according to a fifth embodiment of the present application.

Detailed ways

First embodiment

Please refer to FIG. 1. FIG. 1 is a flowchart of a material labeling method according to a first embodiment of the present application, where the method includes steps S101-S103.

In S101, the materials in the annotation material set are marked according to a preset algorithm model.

In S102, based on the result of the labeling, a training set corresponding to the result of the labeling is generated.

In S103, the preset algorithm model is updated by the training set for the next material annotation.

In an embodiment, in the material annotation, the material may include corpus in the intelligent question answering system, text in the text recognition, and multimedia materials such as audio and video, pictures, and the like. These materials often contain a lot of rich content, but the computer may not be able to directly identify and read, so you need to mark these materials, the annotation is to process the material in the material library, the various features in the material Label in a computer-readable way. For example, the information presented in the picture material in the picture material is marked in the form of text, or face recognition, which is marked with the pixel coordinates and pixel values of the facial features in the image, or as the corpus in the corpus, Label a variety of linguistic features on the corresponding language components for easy identification and reading by the computer. The way of labeling differs according to the application scene. In principle, based on certain logic, multiple features of the material to be labeled are computer-identifiable.

Determine the algorithm model that has been generated. The algorithm model is the algorithm referenced by the annotation material, and the algorithm model referenced by the subsequent material annotation is the algorithm model determined after the last material annotation. The algorithm model needs to be obtained through an analysis training set. Among them, the algorithm model is roughly divided into two types: the initial algorithm model and the transition algorithm model, depending on the timing of generation. The initial algorithm model, that is, the first algorithm model in this material annotation, this algorithm model roughly determines the algorithm logic of all relevant material annotations in the future. The transition algorithm model refers to the algorithm model outside the initial algorithm model. Unlike the initial algorithm model, the transition algorithm model usually changes continuously.

In an embodiment, determining the generated algorithm model may include: manually labeling the material in the initial material set to generate an initial training set; and based on the initial training set, training to generate an initial algorithm model; referring to the initial algorithm model to treat the labeled material set The material is marked, and the initial algorithm model is updated based on the annotation result to form a transition algorithm model; the reference transition algorithm model is used to label the material of the next to be labeled material set, and the transition algorithm model is updated based on the annotation result, so that the material annotation is performed iteratively. The algorithm model is updated to determine the algorithm model. The above steps show the general generation of the algorithm model, which is based on the initial algorithm model and is formed after several iterations of the annotation update.

In an embodiment, the initial algorithm model is generated by first marking the material in the initial material set by manual labeling. The manual annotation here has no reference to the algorithm model, and the human cognition comes from determining how to mark multiple features of the material. Then, after the labeling is completed, the initial training set corresponding to the result of the labeling is generated with reference to the labeling result. The training set is a set of training generation algorithm models. There are often a large number of objects in the training set. Training these objects can generate the desired algorithm model. The initial training set is the initial training set used to train the algorithm model. Then, based on the initial training set, the initial algorithm model is obtained. At this time, since the initial material set is marked by manual labeling, in order to ensure the reliability of the obtained initial algorithm model, verification can also be performed, and the verification can be performed by other people, which is equivalent to referencing multiple verifications. To determine the initial algorithm model.

After the initial algorithm model is determined, it is used as the algorithm model for the second material annotation, which is the reference algorithm model of the next algorithm model. After the material is labeled with reference to the initial algorithm model, the corresponding labeling result and the training set generated according to the labeling result are obtained; this is a new training set different from the initial training set, and the second material is The material in the annotation is often different from the material in the first time. Then, the training set obtained after labeling with the same algorithm model is used as an update package of the initial algorithm model to update the initial algorithm model, so that the initial algorithm model can be included. More detailed algorithm models. The algorithm model obtained at this time is no longer the initial algorithm model, but the transition algorithm model in the algorithm model. There is only one initial algorithm model, and there are usually multiple transition algorithm models. The multiple transition algorithm models are obtained by updating the algorithm model after each material model is labeled with the material, in other words, each time. The material labeling refers to the algorithm model updated after the last material labeling, and after the material labeling, the updated algorithm model is used as the algorithm model referenced by the next material labeling. In this way, the more iterations, the wider the coverage of the algorithm model, the more types and fields of material involved, and the higher the accuracy of subsequent material markers.

In S101, the material to be labeled in the material set is marked according to the algorithm model.

The annotation process here is an iterative process of labeling the material in the annotation material set using the material from the previous material set. In an embodiment, the labeling the material in the annotation material set according to the algorithm model may include: determining a first material to be labeled, the same material as the algorithm model domain, and a second material different from the algorithm model domain; directly passing the algorithm The model labels the first material; and, by manual labeling, the second material. The materials in the material to be labeled can be roughly divided into two categories: one can be directly labeled by the algorithm model, and the material is the same as the first model in the domain of the algorithm model; one class cannot directly pass the algorithm model. For the annotation, this kind of material is the second material different from the algorithm model field. The first material can be directly labeled because the domain is consistent with the algorithm model. Of course, it may also encounter the same field but the categories under the domain are different. As a result, some parts cannot be directly labeled, and they can also be marked by manual labeling, that is, For the first material, the part that cannot be directly labeled by the algorithm model is labeled by manual annotation; the second material cannot be directly labeled because the domain is different from the algorithm model, and is usually directly by manual labeling. Wherein, the manner of determining the first material and the second material in the set of materials to be labeled is generally determined by the material provider in advance, and the area to which the material to be labeled belongs is often known before the labeling; if the material provider does not explicitly provide, It can be done by means of keyword screening, or by manual participation, or it can be directly assumed that the parts are directly labeled in the same field, and the parts that cannot be directly labeled are separated and used as the second material of different fields for manual labeling. .

Please refer to FIG. 2 , which shows a schematic diagram of material annotation, in which material A is used as an initial material, and is manually labeled to generate training set A, and the algorithm model is trained based on training set A, where The initial algorithm model; material B as the same material as the material A field, that is, the material consistent with the algorithm model domain, can be directly labeled by the automatic annotation device integrated with the algorithm model. Among them, the automatic annotation device not only integrates the algorithm model, but also has some other components required for annotation, such as workflow, permission control and other related functions. However, material B has different types of material B' under the domain, which cannot be directly labeled by the algorithm model, but is manually labeled; material C is a different material from the material A field, that is, the algorithm Materials that are inconsistent in the model domain are directly labeled by manual annotation. Regardless of the labeling of the material B, the labeling of the material B', or the labeling of the material C, the corresponding training set is finally generated, and the algorithm model is updated by the training set as the algorithm model referenced for the next material labeling. .

In an embodiment, according to the plurality of materials to be marked according to the algorithm model, the evaluation algorithm model is to be labeled according to the proportion of the first material in the material to be labeled, and/or the accuracy of each labeling. Whether the material's marking ability is up to standard. In each material annotation, according to the same field of the material to be labeled and the domain of the algorithm model, the corresponding first material and second material are correspondingly generated, according to the first material that can be directly labeled in the material to be labeled. The proportion of the algorithm can determine the labeling ability of the algorithm model; in addition, each time the label of the material set to be labeled is checked, the accuracy of the labeling can be known, and the marking ability of the algorithm model can also be determined according to the accuracy rate. After learning the labeling ability of the algorithm model, if the labeling ability of the algorithm model is weak, or the labeling ability is not up to standard, it may be necessary to continue to use the material set for training, and gradually improve the algorithm model labeling ability.

In S102, a corresponding training set is generated based on the result of the labeling. Generating the training set provides the possibility to generate the algorithm model and update the algorithm model. Since the initial algorithm model has been generated based on the manual annotation of the initial material, the subsequent training sets are used as the update algorithm model.

In S103, the algorithm model is updated by the training set for the next material annotation. The next material annotation is generally performed with reference to the updated algorithm model of the previous material annotation. The more iterations, the wider the coverage area, so the fewer times the manual participation is required, and the accuracy of the annotation is also The higher. In order to ensure the reliability of the material annotation, updating the algorithm model through the training set may include: verifying the training set; and after the verification is completed, updating the algorithm model through the verified training set. In an embodiment, the verifying the training set based on the result of the labeling may include: randomly extracting the part from the training set for verification in a random manner; or directly verifying all contents in the training set in a full amount manner.

The embodiment provides a material labeling method, and determines the generated algorithm model. The algorithm model is used for material labeling, and the material of the labeling material set is marked according to the algorithm model, and the corresponding training set is generated based on the labeling result, and updated by the training set. Algorithm model for the next material annotation. Through the implementation of the embodiment, the algorithm model is updated with each labeled result, thereby reducing the workload of manual annotation, and also improving the consistency and accuracy of the annotation.

Second embodiment

Please refer to FIG. 3. FIG. 3 is a detailed flowchart of a material labeling method according to a second embodiment of the present application.

In S301, the material set to be labeled is determined.

The material can include corpus in the intelligent question answering system, text in text recognition, and multimedia materials such as audio and video and pictures.

In S302, the material to be labeled is marked based on the algorithm model after the last material labeling.

The material often contains a lot of rich content, but the computer may not be able to directly identify and read, so you need to mark these materials, the annotation is to process the material in the material library, the various features in the material Label the way in a computer-readable way.

According to the different iteration stages, the algorithm model can be roughly divided into two types: the initial algorithm model and the transition algorithm model. The initial algorithm model, that is, the first algorithm model in this material annotation, this algorithm model roughly determines all relevant correlations in the future. The algorithmic logic of the material annotation. The transition algorithm model refers to the algorithm model outside the initial algorithm model. Unlike the initial algorithm model, the transition algorithm model usually changes continuously.

In S303, it is judged whether the labeling is successful, and if so, the process goes to S304, and if not, the process goes to S307.

Determining whether the label is successful, that is, the process of filtering the first material and the second material from the material; and determining the manner of the first material and the second material in the material set to be labeled may be performed by keyword screening, or It is judged by human participation, or it is directly assumed that the parts are directly labeled in the same field, and the parts that cannot be directly labeled are separated and used as the second material of different fields for manual labeling.

In S304, a training set is generated based on the labeling result.

Generating the training set provides the possibility to generate the algorithm model and update the algorithm model. Since the initial algorithm model has been generated based on the manual annotation of the initial material, the subsequent training sets are used as the update algorithm model.

In S305, the training set is verified.

In order to ensure the reliability of the material labeling, the training set may be verified. The verification method may include: randomly extracting parts from the training set for verification by means of random inspection; or directly verifying all the training sets in a full amount manner. content.

In S306, the algorithm model is updated by the verified training set, and the process returns to S401.

In S307, the material that failed the labeling is marked by manual labeling.

Third embodiment

Please refer to FIG. 4. FIG. 4 is a schematic diagram of a material labeling method according to a third embodiment of the present invention. The material labeling method in the embodiment is based on a bank business corpus, and the corresponding algorithm model and loop iteration are trained. To achieve automatic labeling, the implementation steps are as follows S401-S408.

In S401, the business corpus of the first batch of bank A is determined.

In S402, a corpus training set of Bank A is formed.

In S403, the algorithm model of the corpus training set based on the bank A is generated and embedded in the smart labeling system.

In S404, when the business corpus of the second batch of bank B needs to be marked, it is judged that the business corpus of bank A and bank B belong to the corpus of the banking field, belonging to different sub-categories of the same field, and most of the business terms and vocabulary are similar, so The business corpus of Bank B is entered into the intelligent labeling system for automatic labeling. Depending on the size of the corpus that needs to be automatically annotated, consider deploying a distributed intelligent annotation system.

In S405, for the business corpus of the bank B, the part that is automatically marked by the algorithm model in the smart labeling system cannot be formed, and the X corpus 'is manually marked with the X corpus', thereby reducing the workload and consumption of the manual labeling. Time. The result of the manual labeling forms a second batch of training sets, and steps S402-S403 are performed again, thereby realizing the updating of the algorithm model, and the intelligent labeling system is optimized and expanded for the second time.

In S406, when the business corpus of the third batch of bank C needs to be labeled, the operation in step S404 is repeated, thereby realizing the updating of the algorithm model, and the intelligent labeling system is once again optimized and expanded.

In S407, when the customer service corpus of the fourth batch of e-commerce needs to be marked, it is judged that the customer service corpus of the e-commerce and the bank corpus that has been manually marked are not the corpus of the same field, and the terms and vocabulary are very different, so the artificial pair is The e-commerce customer service corpus is marked. The fourth batch of training sets is formed, and steps S402-S403 are performed again, thereby realizing the updating of the algorithm model, and the intelligent labeling system is optimized and expanded for the fourth time.

In S408, when the customer service corpus of the fifth batch of e-commerce needs to be labeled, the operation in step S404 is repeated, thereby realizing the update of the algorithm model, and the intelligent annotation system is optimized and expanded for the fifth time.

If you need to mark the same type of material in the same field, use the intelligent annotation system to automatically mark these materials. If the intelligent labeling system has been iterated and optimized for many rounds, the corpus can theoretically achieve automatic labeling and accuracy. Meet the standard.

The analysis can only indicate the automatic indexing ratio and accuracy of the system for different sub-categories in the same field and the same sub-categories in the same field. It can be judged whether more rich corpus needs to be collected to continue training the algorithm model.

Fourth embodiment

Please refer to FIG. 5. FIG. 5 is a schematic diagram of a composition of a material labeling device according to a fourth embodiment of the present invention. The material labeling device includes a material labeling module 501, a training generating module 502, and an algorithm training module 503.

The material labeling module 501 is configured to label the materials in the annotation material set according to the preset algorithm model.

The training generation module 502 is configured to generate a training set corresponding to the labeled result based on the result of the annotation.

The algorithm training module 503 is configured to update the algorithm model through the training set for the next material annotation.

In an embodiment, the material is marked, wherein the material may include corpus in the intelligent question answering system, text in the text recognition, and multimedia materials such as audio and video, pictures, and the like. These materials often contain a lot of rich content, but the computer may not be able to directly identify and read, so you need to mark these materials, the annotation is to process the material in the material library, the various features in the material Labeling in a computer-recognizable manner, for example, marking information in a picture material in the form of a text in the form of text, or face recognition, using pixel coordinates and pixel values of facial features in the image Labeling, or corpus in the corpus, labels a variety of linguistic features on the corresponding language components to facilitate computer identification and reading. The way of labeling differs according to the application scenario. In principle, based on certain logic, multiple features of the material to be labeled are computer-identifiable.

Determine the algorithm model that has been generated. The algorithm model is the algorithm referenced by the annotation material, and the algorithm model referenced by the subsequent material annotation is the algorithm model determined after the last material annotation. The algorithm model needs to be obtained through the analysis training set. The algorithm model is roughly divided into two types: the initial algorithm model and the transition algorithm model according to the different generation timing. The initial algorithm model, that is, the first algorithm model in this material annotation, this algorithm model roughly determines the algorithm logic of all relevant material annotations in the future. The transition algorithm model refers to the algorithm model outside the initial algorithm model. Unlike the initial algorithm model, the transition algorithm model usually changes continuously.

In an embodiment, determining the generated algorithm model may include: manually labeling the materials in the initial material set to generate an initial training set; and the training generating module 502 is configured to generate an initial algorithm model based on the initial training set; the material labeling module 501 Referring to the initial algorithm model, the material in the annotation material set is marked; the algorithm training module 503 updates the initial algorithm model based on the annotation result to form a transition algorithm model; the material labeling module 501 refers to the transition algorithm model to label the material of the next to be labeled material set. The algorithm training module 503 then updates the transition algorithm model based on the annotation result, so as to iteratively update the material annotation and the algorithm model, and determine the algorithm model. The above steps show a general generation manner of the algorithm model. In an embodiment, the algorithm model is formed based on the initial algorithm model after several iterations of the annotation update. An alternative way to generate the initial algorithm model is to first label the material in the initial material set by manual annotation. The manual annotation here has no reference to the algorithm model, and the human cognition comes from determining how to mark multiple features of the material. Then, after the labeling is completed, the corresponding initial training set is generated with reference to the labeling result. The training set is a set of training generation algorithm models. There are often a large number of objects in the training set. Training these objects can generate the desired algorithm model. The initial training set is the initial training set used to train the algorithm model. Then, based on the initial training set, the initial algorithm model is obtained. At this time, since the initial material set is marked by manual labeling, in order to ensure the reliability of the obtained initial algorithm model, verification can also be performed, and the verification can be performed by other people, which is equivalent to referencing multiple verifications. To determine the initial algorithm model.

The material labeling module 501 is configured to label the material to be labeled in the material set according to the algorithm model. The labeling process here is the next iteration of the material labeling in the previous material set; and in an embodiment, labeling the material in the labeling material set according to the algorithm model may include: determining the material set to be labeled, which is the same as the algorithm model domain. The first material, and the second material different from the algorithm model domain; the first material is directly labeled by the algorithm model; and the second material is labeled by manual labeling. The materials in the material to be labeled can be roughly divided into two categories: one can be directly labeled by the algorithm model, and the material is the same as the first model in the domain of the algorithm model; one class cannot directly pass the algorithm model. For the annotation, this kind of material is the second material different from the algorithm model field. The first material can be directly labeled because the domain is consistent with the algorithm model. Of course, it may also encounter the same field but the categories below the domain are different, resulting in some parts that cannot be directly labeled, and can also be marked by manual labeling; Because the material is different from the algorithm model, it cannot be directly labeled, and it is often taken directly by manual labeling. The method for determining the first material and the second material in the material set to be labeled may be performed by keyword screening or the like, or may be judged by manual participation, or directly assumed to be directly labeled in the same field. For parts that cannot be directly labeled, they are separated and used as the second material of different fields for manual labeling.

Please refer to FIG. 2 , which shows a schematic diagram of material annotation, in which material A is used as an initial material, and is manually labeled to generate training set A, and the algorithm model is trained based on training set A, where The initial algorithm model; material B as the same material as the material A field, that is, the material consistent with the domain of the algorithm model, can be directly labeled by the algorithm model; however, the material B has different categories under the field. The material B' cannot be directly labeled by the algorithm model, but is manually labeled; the material C is a material different from the material A field, that is, a material that is inconsistent with the algorithm model field, and is directly labeled by manual labeling. Regardless of the labeling of the material B, the labeling of the material B', or the labeling of the material C, the corresponding training set is finally generated, and the algorithm model is updated by the training set as the algorithm model referenced for the next material labeling. .

The training generation module 502 is configured to generate a corresponding training set based on the results of the annotations. Generating the training set provides the possibility to generate the algorithm model and update the algorithm model. Since the initial algorithm model has been generated based on the manual annotation of the initial material, the subsequent training sets are used as the update algorithm model.

The algorithm training module 503 is configured to update the algorithm model through the training set for the next material annotation. The next material annotation is generally carried out with reference to the updated algorithm model of the previous material annotation. The more iterations, the wider the coverage area, the less the number of manual interventions required, and the accuracy of labeling. The higher. In order to ensure the reliability of the material annotation, updating the algorithm model through the training set may include: verifying the training set based on the result of the annotation; and after the verification is completed, updating the algorithm model through the verified training set. In an embodiment, the verifying the training set based on the result of the labeling may include: randomly extracting the part from the training set for verification in a random manner; or directly verifying all contents in the training set in a full amount manner.

The embodiment provides a material labeling device, which determines the generated algorithm model, the algorithm model is used for material labeling, and the material of the labeling material set is marked according to the algorithm model, and the corresponding training set is generated based on the labeling result, and is updated by the training set. Algorithm model for the next material annotation. Through the implementation of the embodiment, the algorithm model is updated with each labeled result, thereby reducing the workload of manual annotation, and also improving the consistency and accuracy of the annotation.

Fifth embodiment

Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of a terminal according to a fifth embodiment of the present disclosure, including:

The processor 601, the memory 602 and the communication bus 603; the communication bus 603 is arranged to implement connection communication between the processor 601 and the memory 602; the processor 601 is arranged to execute the material labeling program stored in the memory 602 to implement the aforementioned material labeling Embodiments of the method are not described herein again.

In addition, the embodiment further provides a computer readable storage medium, where the computer readable storage medium stores one or more computer programs, and the computer program can be executed by one or more processors to implement the foregoing material labeling. Embodiments of the method are not described herein again.

Obviously, those skilled in the art should understand that the above modules or steps of the present application can be implemented by a general computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in a storage medium (ROM/RAM, diskette, optical disk) by a computing device, and in some cases The steps shown or described may be performed in an order different than that herein, or they may be separately fabricated into individual integrated circuit modules, or a plurality of the modules or steps may be implemented as a single integrated circuit module. Therefore, the application is not limited to any particular combination of hardware and software.

Claims

A material annotation method, including:

Label the material in the annotation material set according to the preset algorithm model;

Generating a training set corresponding to the result of the annotation based on the result of the annotation;

The preset algorithm model is updated by the training set for the next material annotation.
The method of claim 1, wherein the labeling of the material to be labeled in the set of materials according to the preset algorithm model comprises:

Determining, in the set of the to-be-labeled material, a first material that is the same as the preset algorithm model domain, and a second material that is different from the preset algorithm model domain;

Labeling the first material directly through the preset algorithm model; and,

The second material is labeled by manual labeling.
The method of claim 2, wherein the labeling the first material directly by the preset algorithm model comprises:

For the first material, the portion that cannot be marked by the preset algorithm model is labeled by manual labeling.
The method of any of claims 1-3, further comprising:

When the materials of the annotation material set are to be labeled according to the preset algorithm model, the accuracy of each annotation is evaluated according to the accuracy of each annotation.
The method of any one of claims 1-3, wherein the updating the preset algorithm model by the training set comprises:

Performing verification on the training set;

After the verification is completed, the preset algorithm model is updated by the verified training set.
The method of claim 5 wherein said verifying said training set comprises:

Randomly extracting portions from the training set for verification in a random manner; or, in a full amount, directly verifying all contents in the training set.
A material labeling device comprising:

a material labeling module, configured to label the materials in the annotation material set according to a preset algorithm model;

a training generation module configured to generate a training set corresponding to the labeled result based on the result of the annotation;

The algorithm training module is configured to update the preset algorithm model through the training set for the next material annotation.
The apparatus of claim 7, wherein the material annotation module is further configured to:

Determining, in the set of the to-be-labeled material, a first material that is the same as the preset algorithm model domain, and a second material that is different from the preset algorithm model domain;

Labeling the first material directly through the preset algorithm model; and,

The second material is labeled by manual labeling.
The apparatus of claim 8 wherein said material annotation module is further configured to:

For the first material, the portion that cannot be marked by the preset algorithm model is labeled by manual labeling.
The apparatus of any one of claims 7-9, wherein the algorithm training module is configured to: update the preset algorithm model through the training set by:

Performing verification on the training set;

After the verification is completed, the preset algorithm model is updated by the verified training set.
The apparatus of claim 10 wherein said algorithm training module is configured to verify said training set by:

Randomly extracting portions from the training set for verification in a random manner; or, in a full amount, directly verifying all contents in the training set.
A terminal comprising a processor, a memory, and a communication bus; the communication bus is configured to implement connection communication between the processor and the memory; the processor is configured to execute a material annotation program stored in the memory to A material labeling method according to any one of claims 1 to 6.
A computer readable storage medium having stored therein at least one computer program executable by at least one processor to implement the material of any of claims 1-6 Labeling method.