CN114970724A - Data labeling method, device, equipment and storage medium - Google Patents

Data labeling method, device, equipment and storage medium Download PDF

Info

Publication number
CN114970724A
CN114970724A CN202210598807.4A CN202210598807A CN114970724A CN 114970724 A CN114970724 A CN 114970724A CN 202210598807 A CN202210598807 A CN 202210598807A CN 114970724 A CN114970724 A CN 114970724A
Authority
CN
China
Prior art keywords
labeling
models
model
result
results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210598807.4A
Other languages
Chinese (zh)
Inventor
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210598807.4A priority Critical patent/CN114970724A/en
Publication of CN114970724A publication Critical patent/CN114970724A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data labeling method, a data labeling device, data labeling equipment and a storage medium, and relates to the technical field of data processing, in particular to the technical field of artificial intelligence, voice technology and deep learning. The specific implementation scheme is as follows: respectively labeling the target data by utilizing the plurality of models to obtain a plurality of labeling results, wherein the labeling result of at least one model in the plurality of models is obtained according to the labeling result of at least one other model in the plurality of models; and determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results. According to the scheme disclosed by the invention, a more accurate labeling result of the target data can be obtained.

Description

Data labeling method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technology, and in particular, to the field of artificial intelligence, speech technology, and deep learning technology.
Background
With the advent of the artificial intelligence era, machine learning is being applied to more and more fields. In machine learning training with supervised learning, the problem of acquiring high-quality training data is solved firstly, and the following machine learning training process can be carried out only by using labeled data.
Disclosure of Invention
The disclosure provides a data annotation method, a device, equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a method of data annotation, including:
and labeling the target data by utilizing the plurality of models respectively to obtain a plurality of labeling results, wherein the labeling result of at least one model in the plurality of models is obtained according to the labeling result of at least one other model in the plurality of models. And
and determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
According to another aspect of the present disclosure, there is provided an apparatus for data annotation, including:
and the labeling module is used for labeling the target data by utilizing the plurality of models respectively to obtain a plurality of labeling results, wherein the labeling result of at least one model in the plurality of models is obtained according to the labeling result of at least one other model in the plurality of models. And
and the determining module is used for determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.
According to the scheme disclosed by the invention, a more accurate marking result of the target data can be obtained.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow diagram of a method of data annotation according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an application scenario of a method for data annotation according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating step S101 of a method for data annotation according to an embodiment of the disclosure;
FIG. 4 is a flowchart illustrating a step S101 of a method for annotating data according to another embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an apparatus for data annotation according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of an electronic device for implementing a method of data annotation of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An embodiment of the present disclosure provides a data annotation method, as shown in fig. 1, which is a flowchart of the data annotation method of this embodiment, and the method may include the following steps:
s101: and labeling the target data by utilizing the plurality of models respectively to obtain a plurality of labeling results, wherein the labeling result of at least one model in the plurality of models is obtained according to the labeling result of at least one other model in the plurality of models. And
s102: and determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
The target data may be voice data, image data, text data, or the like. When the target data is voice data, the text content corresponding to the voice data can be determined through the labeling result. When the target data is image data, it can be determined which image contents are contained in the image data, what specific personnel in the image are, and the like, by the labeling result. When the target data is text data, it can be determined, by the labeling result, which domain the content described by the text data pertains to, the type of the text data, and the like.
The model used can be understood as software or a system with a labeling function, and can also be understood as a pre-trained neural network model with a labeling function. After each model labels the target data, at least one labeling result can be obtained.
The multiple models can include a model A, and the model A can obtain a preliminary labeling result according to the target data and then adjust the preliminary labeling result according to the labeling results of one or more other models, so as to obtain the actual labeling result of the model A for the target data. The multiple models can also comprise a B model, and the B model can directly obtain the labeling result according to the target data without referring to other labeling results. The plurality of models may include at least one a model, and the plurality of models may not include a B model.
The frequency of occurrence of a plurality of labeled results can be understood as: the number of occurrences of each annotated result. For example, the four models output four labeling results, where the labeling results of the first model and the second model are both a, the labeling result of the third model is b, and the labeling result of the fourth model is c, then the frequency of occurrence of the labeling result of a is 2, and the frequency of occurrence of the labeling results of b and c is 1.
The determined final annotation result can be a single annotation result or a plurality of different annotation results.
According to the scheme disclosed by the invention, because the labeling process of some models refers to the labeling results of other models and the labeling result of the model is adjusted based on the labeling results, the labeling result obtained by each model can be more accurate, and the final labeling result obtained based on the labeling results of all models is further ensured to be more accurate. Meanwhile, some models refer to the labeling results of other models, so that the problems of over dispersion and inconsistency of the labeling result of each model are avoided, and the final labeling result can be obtained by fast convergence after the target data is labeled by using a plurality of models.
The method for labeling data provided by the embodiment of the disclosure can be applied to the scene framework shown in fig. 2. In fig. 2, 10 denotes a terminal, 20 denotes a server, and 30 denotes a distributed computer system. The data annotation methods of the present disclosure may be performed jointly by the terminal 10, the server 20, or the distributed computer system 30, or may be performed by one or more of the terminal 10, the server 20, or the distributed computer system 30. The terminal 10 may be used to report/transmit target data to the server 20 or the distributed computer system 30. After the server 20 or the distributed computer system 30 completes the disclosed data annotation method through multiple models, the final annotation result can be fed back to the terminal 10. The terminal 10, server 20, or distributed computer system 30 may also perform the disclosed data annotation methods using multiple models based on the target data.
In one embodiment, the method for annotating data provided by the embodiments of the present disclosure includes steps S101 and S102, wherein the step S101: labeling the target data by using the multiple models respectively to obtain multiple labeling results, where the labeling result of at least one of the multiple models is obtained according to the labeling result of at least one other of the multiple models, and may further include:
and sequentially and respectively labeling the target data by utilizing the plurality of models to obtain a plurality of labeling results, wherein the labeling result of a later model in the plurality of models is obtained according to the labeling result of at least one earlier model.
It should be noted that, the plurality of models label the target data sequentially and respectively, and it can be understood that: and after the former model labels the target data to obtain a labeling result, the latter model labels the target data.
The preceding model may be understood as a model preceding the succeeding model, or may be understood as one or more models preceding the succeeding model.
The labeling result of the subsequent model is obtained according to the labeling result of at least one previous model, and can be understood as follows: after the subsequent model A obtains a preliminary labeling result according to the target data, the preliminary labeling result is adjusted according to the labeling result of the prior model B (namely, the model which labels the target data before the model A is labeled), and then the actual labeling result of the model A for the target data is obtained. It can also be understood that: and after the subsequent model A obtains a preliminary annotation result according to the target data, adjusting the preliminary annotation result according to the annotation results of the prior model B and the prior model C, and further obtaining the actual annotation result of the model A for the target data.
According to the scheme disclosed by the invention, because the labeling process of the subsequent model refers to the labeling results of other prior models and the labeling result of the model is adjusted based on the reference results, the labeling result obtained by the subsequent model can be more accurate, and the final labeling result obtained based on the labeling results of all models is further ensured to be more accurate. Meanwhile, the later model refers to the labeling result of the previous model, so that the problems of over dispersion and inconsistency of the labeling result of each model are avoided, and the final labeling result can be obtained by fast convergence after the target data is labeled by using a plurality of models.
In one example, each model may label a sentence of speech with a different labeling result, subject to the audio quality of the speech, the accent of the speaker of the speech, or the ambient noise impression of the speech. In order to avoid the situation that the same labeling result cannot be obtained, it is necessary to label a sentence of voice by using the data labeling method of the embodiment of the present disclosure. As shown in fig. 3, the plurality of models includes a first model, a second model, a third model, a fourth model, and a fifth model, and voice data (target data) is labeled by the five models. The first model is used for obtaining a first labeling result after the voice data are labeled. And when the second model is used for marking the voice data, the preliminary marking result obtained by the second model is adjusted by referring to the first marking result, and then a second marking result is obtained. And when the third model is used for marking the voice data, the preliminary marking result obtained by the third model is adjusted by referring to the second marking result, and then a third marking result is obtained. And when the fourth model is used for marking the voice data, the preliminary marking result obtained by the fourth model is adjusted by referring to the third marking result, and then a fourth marking result is obtained. And when the fifth model is used for marking the voice data, the preliminary marking result obtained by the fifth model is adjusted by referring to the fourth marking result, and then the fifth marking result is obtained.
In a specific example, the content of the speech data obtained by the first labeling result obtained by the first model includes five features, where a 1: now, a 2: day, a 3: day, a 4: gas, A5: and (5) sunny. The preliminary annotation result obtained by the second model comprises six features, wherein B1: b2: day, B3: day, B4: gas, B5: clear, B6: and cooling, on the basis, adjusting the primary labeling result according to the five characteristics of the first labeling result to obtain a second labeling result of the second model, wherein B1: today, B2: day, B3: day, B4: gas, B5: fine, B6: and (E) the quality of the product is improved.
In one embodiment, the method for annotating data provided by the embodiments of the present disclosure includes steps S101 and S102, wherein the step S101: labeling the target data by using the multiple models respectively to obtain multiple labeling results, where the labeling result of at least one of the multiple models is obtained according to the labeling result of at least one other of the multiple models, and may further include:
and sequentially and respectively labeling the target data by utilizing the plurality of models to obtain a plurality of labeling results, wherein the labeling results are obtained according to the labeling results of two previous models from the third model in the plurality of models.
It should be noted that two previous models can be understood as follows: any two previous models before the current model. It can also be understood as two previous models in a sequential order before the current model. For example, A, B, C, D, E five models are included, and the two previous models according to which the D model is based may be B and C models, or a and B models, or a and C models.
According to the scheme disclosed by the invention, since the labeling process of the model starts from the third model, the labeling results of two previous models are referred to, and the labeling results of the model are adjusted on the basis, the labeling results obtained by the third model and the subsequent models are more accurate, so that the final labeling results obtained based on the labeling results of all models are further more accurate. Meanwhile, the later model refers to the labeling result of the previous model, so that the problems of over dispersion and inconsistency of the labeling result of each model are avoided, and the final labeling result can be obtained by fast convergence after the target data is labeled by using a plurality of models.
In one embodiment, the method for annotating data provided by the embodiments of the present disclosure includes steps S101 and S102, wherein the step S101: labeling the target data with the multiple models respectively to obtain multiple labeling results, where the labeling result of at least one of the multiple models is obtained according to the labeling result of at least one other of the multiple models, and may further include:
and sequentially and respectively labeling the target data by utilizing the plurality of models to obtain a plurality of labeling results, wherein the labeling results are obtained according to the labeling results of two previous models from the third model in the plurality of models. Starting from a fourth model of the plurality of models, one of the labeling results of the two preceding models on which it is based is the labeling result of the previous model of the current model.
According to the scheme disclosed by the invention, since the labeling process of the model starts from the third model, the labeling results of two previous models are referred to, and the labeling results of the model are adjusted on the basis, the labeling results obtained by the third model and the subsequent models are more accurate, so that the final labeling results obtained based on the labeling results of all models are further more accurate. Meanwhile, the later model refers to the labeling result of the previous model, so that the problems of over dispersion and inconsistency of the labeling result of each model are avoided, and the final labeling result can be obtained by fast convergence after the target data is labeled by using a plurality of models.
In one example, each model may annotate a different annotation result for a sentence of speech, subject to the audio quality of the speech, the accent of the speaker of the speech, or the environmental noise impression of the speech. In order to avoid the situation that the same labeling result cannot be obtained, it is necessary to label a sentence of voice by using the data labeling method of the embodiment of the present disclosure. As shown in fig. 4, the plurality of models includes a first model, a second model, a third model, a fourth model, and a fifth model, and voice data (target data) is labeled by the five models. The first model is used for obtaining a first labeling result after the voice data are labeled. And the second model labels the voice data to obtain a second labeling result. And when the third model is used for marking the voice data, the preliminary marking result obtained by the third model is adjusted by referring to the first marking result and the second marking result, and then a third marking result is obtained. And when the fourth model marks the voice data, the preliminary marking result obtained by the fourth model is adjusted by referring to the third marking result and the second marking result, and a fourth marking result is obtained. And when the fifth model is used for marking the voice data, the preliminary marking result obtained by the fifth model is adjusted by referring to the third marking result and the fourth marking result, and then a fifth marking result is obtained.
In a specific example, the content of the speech data obtained by the first labeling result obtained by the first model includes five features, where a 1: now, a 2: day, a 3: day, a 4: gas, A5: and (5) sunny. The second labeling result obtained by the second model is that the content of the voice data comprises six features, wherein B1: today, B2: day, B3: day, B4: gas, B5: fine, B6: and (E) the quality of the product is improved. The preliminary annotation result obtained by the third model according to the speech data includes six features, wherein C1: jing, C2: day, C3: field, C4: gas, C5: fine, C6: on the basis, the preliminary labeling result is adjusted according to the first labeling result and the second labeling result to obtain a third labeling result of the third model, wherein C1: c2: day, C3: day, C4: gas, C5: fine, C6: and (1).
In one example, starting with a fourth model of the multiple models, obtaining the annotation result of the fourth model according to the annotation results (annotation result a and annotation result B) of two previous models includes:
obtaining a third labeling result of a third model of a previous model of the fourth model, and taking the third labeling result as a labeling result A; and
obtaining a first labeling result of the first model and a second labeling result of the second model, and determining a labeling result with a larger difference from the labeling result of the third model as a labeling result B (e.g., a first labeling result);
and adjusting the preliminary labeling result obtained by the fourth model according to the target data according to the labeling result A (namely the third labeling result) and the labeling result B (namely the first labeling result) to obtain a fourth labeling result of the fourth model.
It should be noted that, when the plurality of models includes a fifth model, a sixth model, or even more models, the models may refer to the manner in which the fourth model selects the annotation result (annotation result a and annotation result B) of the previous model when selecting the annotation result of the previous model.
In one embodiment, the method for annotating data provided by the embodiments of the present disclosure includes steps S101 and S102, where step S102: determining a final labeling result of the target data according to the occurrence frequency of the plurality of labeling results, which may further include:
and determining the labeling result with the highest occurrence frequency as the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
It should be noted that, determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results may be understood as: and after all models obtain the labeling results, selecting the labeling result with the highest occurrence frequency as the final labeling result. It can also be understood that: only part of the models obtain the labeling result, and when the other part of the models do not obtain the labeling result, the occurrence frequency of the same labeling result in the part of the models obtaining the labeling result exceeds half of the total number of the models, and at the moment, the labeling result can be determined as the final labeling result.
According to the scheme disclosed by the invention, the marking result with the highest occurrence frequency is selected as the final marking result of the target data, so that the accuracy of the obtained final marking result can be ensured.
In one example, if the model is software or system with a labeling function, when the subsequent model adjusts its initial labeling result based on the labeling result of the previous model, the calculation parameters of the software or system may be modified based on the labeling result of the previous model, so as to retrieve the labeling result.
In one example, if the model is a pre-trained neural network model with a labeling function, when the subsequent model adjusts its preliminary labeling result based on the labeling result of the previous model, the parameters or weights of the neural network model may be modified based on the labeling result of the previous model, so as to obtain the labeling result again.
An embodiment of the present disclosure provides a data labeling apparatus, as shown in fig. 5, which is a block diagram of a data labeling apparatus of this embodiment, and the apparatus may include:
and a labeling module 510, configured to label the target data with the multiple models respectively to obtain multiple labeling results, where a labeling result of at least one of the multiple models is obtained according to a labeling result of at least one other of the multiple models. And
the determining module 520 is configured to determine a final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
In an embodiment, the labeling module 510 is configured to label the target data sequentially and respectively by using a plurality of models to obtain a plurality of labeling results, where a labeling result of a later model in the plurality of models is obtained according to a labeling result of at least one earlier model.
In one embodiment, the labeling module 510 is configured to label the target data sequentially and respectively by using a plurality of models to obtain a plurality of labeling results, where the labeling results are obtained according to the labeling results of two previous models, starting from a third model of the plurality of models.
In one embodiment, the labeling module 510 is configured to label the target data sequentially and respectively by using a plurality of models to obtain a plurality of labeling results, where the labeling results are obtained according to the labeling results of two previous models, starting from a third model of the plurality of models. Starting from a fourth model of the plurality of models, one of the labeling results of the two preceding models on which it is based is the labeling result of the previous model of the current model.
In one embodiment, the determining module 520 is configured to determine, according to the frequency of occurrence of the plurality of annotation results, that the annotation result with the highest frequency of occurrence is the final annotation result of the target data.
In one embodiment, the target data is any one of voice data, image data, or text data.
For a description of specific functions and examples of each module and sub-module of the apparatus in the embodiment of the present disclosure, reference may be made to the description of corresponding steps in the foregoing method embodiments, and details are not repeated here.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the method of data labeling. For example, in some embodiments, the method of data annotation may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above described method of data annotation may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of data annotation.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. A method of data annotation, comprising:
respectively labeling the target data by utilizing a plurality of models to obtain a plurality of labeling results, wherein the labeling result of at least one model in the plurality of models is obtained according to the labeling result of at least one other model in the plurality of models; and
and determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
2. The method of claim 1, wherein the labeling the target data with the multiple models respectively to obtain multiple labeling results, wherein the labeling result of at least one of the multiple models is obtained according to the labeling result of at least one other of the multiple models, and comprises:
and sequentially and respectively labeling the target data by utilizing the plurality of models to obtain a plurality of labeling results, wherein the labeling result of a later model in the plurality of models is obtained according to the labeling result of at least one earlier model.
3. The method of claim 1, wherein the labeling the target data with the multiple models respectively to obtain multiple labeling results, wherein the labeling result of at least one of the multiple models is obtained according to the labeling result of at least one other of the multiple models, and comprises:
and sequentially and respectively labeling the target data by utilizing the plurality of models to obtain a plurality of labeling results, wherein the labeling results are obtained according to the labeling results of two previous models from the third model in the plurality of models.
4. The method of claim 1, wherein the labeling the target data with the multiple models respectively to obtain multiple labeling results, wherein the labeling result of at least one of the multiple models is obtained according to the labeling result of at least one other of the multiple models, and comprises:
sequentially and respectively labeling the target data by utilizing a plurality of models to obtain a plurality of labeling results, wherein the labeling results are obtained according to the labeling results of two previous models from the third model in the plurality of models; starting from a fourth model of the plurality of models, one of the labeling results of the two preceding models on which the labeling result is based is the labeling result of the previous model of the current model.
5. The method according to any one of claims 1 to 4, wherein the determining a final annotation result of the target data according to the frequency of occurrence of the plurality of annotation results comprises:
and determining the labeling result with the highest occurrence frequency as the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
6. The method according to any one of claims 1 to 4, wherein the target data is any one of voice data, image data, or text data.
7. An apparatus for data annotation, comprising:
the labeling module is used for labeling the target data by utilizing a plurality of models respectively to obtain a plurality of labeling results, wherein the labeling result of at least one model in the plurality of models is obtained according to the labeling result of at least one other model in the plurality of models; and
and the determining module is used for determining the final labeling result of the target data according to the occurrence frequency of the plurality of labeling results.
8. The apparatus of claim 7, wherein the labeling module is configured to label the target data sequentially and respectively by using a plurality of models to obtain a plurality of labeling results, and a labeling result of a later model in the plurality of models is obtained according to a labeling result of at least one earlier model.
9. The apparatus of claim 7, wherein the labeling module is configured to label the target data sequentially and respectively by using a plurality of models to obtain a plurality of labeling results, and starting from a third model of the plurality of models, the labeling results are obtained according to labeling results of two previous models.
10. The device of claim 7, wherein the labeling module is configured to label the target data sequentially and respectively by using a plurality of models to obtain a plurality of labeling results, wherein from a third model of the plurality of models, the labeling results are obtained according to the labeling results of two previous models; starting from a fourth model of the plurality of models, one of the labeling results of the two preceding models on which the labeling result is based is the labeling result of the previous model of the current model.
11. The apparatus according to any one of claims 7 to 10, wherein the determining module is configured to determine, according to the frequency of occurrence of the plurality of labeled results, a labeled result with the highest frequency of occurrence as a final labeled result of the target data.
12. The apparatus according to any one of claims 7 to 10, wherein the target data is any one of voice data, image data, or text data.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 6.
CN202210598807.4A 2022-05-30 2022-05-30 Data labeling method, device, equipment and storage medium Pending CN114970724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210598807.4A CN114970724A (en) 2022-05-30 2022-05-30 Data labeling method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210598807.4A CN114970724A (en) 2022-05-30 2022-05-30 Data labeling method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114970724A true CN114970724A (en) 2022-08-30

Family

ID=82958525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210598807.4A Pending CN114970724A (en) 2022-05-30 2022-05-30 Data labeling method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114970724A (en)

Similar Documents

Publication Publication Date Title
CN112597754B (en) Text error correction method, apparatus, electronic device and readable storage medium
CN112926306B (en) Text error correction method, device, equipment and storage medium
CN112988727B (en) Data annotation method, device, equipment, storage medium and computer program product
CN113657483A (en) Model training method, target detection method, device, equipment and storage medium
CN115358392B (en) Training method of deep learning network, text detection method and device
CN114187459A (en) Training method and device of target detection model, electronic equipment and storage medium
CN113378835A (en) Labeling model training method, sample labeling method and related device
CN113378855A (en) Method for processing multitask, related device and computer program product
CN113705362A (en) Training method and device of image detection model, electronic equipment and storage medium
CN113627536A (en) Model training method, video classification method, device, equipment and storage medium
CN114186681A (en) Method, apparatus and computer program product for generating model clusters
CN113360683A (en) Method for training cross-modal retrieval model and cross-modal retrieval method and device
CN114445682A (en) Method, device, electronic equipment, storage medium and product for training model
CN113592981B (en) Picture labeling method and device, electronic equipment and storage medium
CN114970724A (en) Data labeling method, device, equipment and storage medium
CN110895655A (en) Method and device for extracting text core phrase
CN114707638A (en) Model training method, model training device, object recognition method, object recognition device, object recognition medium and product
CN115641481A (en) Method and device for training image processing model and image processing
CN113360672A (en) Methods, apparatus, devices, media and products for generating a knowledge graph
CN113221519A (en) Method, apparatus, device, medium and product for processing tabular data
CN115312042A (en) Method, apparatus, device and storage medium for processing audio
CN112560437A (en) Text smoothness determination method and device and target model training method and device
CN114896986B (en) Method and device for enhancing training data of semantic recognition model
CN113360346B (en) Method and device for training model
CN113408300B (en) Model training method, brand word recognition device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination