CN110750694A - Data annotation implementation method and device, electronic equipment and storage medium - Google Patents

Data annotation implementation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110750694A
CN110750694A CN201910935375.XA CN201910935375A CN110750694A CN 110750694 A CN110750694 A CN 110750694A CN 201910935375 A CN201910935375 A CN 201910935375A CN 110750694 A CN110750694 A CN 110750694A
Authority
CN
China
Prior art keywords
data
labeling
labeled
marked
annotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910935375.XA
Other languages
Chinese (zh)
Inventor
孙震
杭圣烨
陈忻
张新琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN201910935375.XA priority Critical patent/CN110750694A/en
Publication of CN110750694A publication Critical patent/CN110750694A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data annotation realization method, which is characterized by comprising the following steps: acquiring data to be marked; the data to be labeled are distributed to at least two labeling terminals; receiving the data labeled by the at least two labeling terminals; comparing the labeling results of the data labeled by the at least two labeling terminals; if the labeling results are consistent, storing the labeled data; and if the labeling results are not consistent, sending the data to be labeled and the labeling results of the at least two labeling terminals to a specified terminal. The invention also discloses a data annotation implementation device, electronic equipment and a storage medium. The data annotation implementation method and device, the electronic device and the storage medium provided by the embodiment of the invention can solve the problem of inaccurate data annotation information to a certain extent.

Description

Data annotation implementation method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for implementing data annotation, an electronic device, and a storage medium.
Background
A data set, also known as a data set, or data set, is a collection of data. Dataset (or Dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a question of a data set of a certain member. It lists values that are viewed as a random number of objects or values for each variable, such as height and weight. Each value is referred to as a data datum. The data of the data set may include one or more members corresponding to the number of rows.
For the deep learning project of the supervision class, the quantity and quality of the data sets determine the good and bad effect of the project, so that the collection and labeling of the data sets are an essential ring in the project.
However, in the prior art, the annotation of the data is usually completed by one person, no person supervises whether the annotation result is correct or not, and the hidden danger of inaccurate data annotation information exists, so that the accuracy of the finally established model is influenced.
Disclosure of Invention
In view of the above, an objective of the embodiments of the present invention is to provide a method and an apparatus for implementing data annotation, an electronic device, and a storage medium, which can solve the problem of inaccurate data annotation information to a certain extent.
Based on the above object, a first aspect of the embodiments of the present invention provides a data annotation implementation method, including:
acquiring data to be marked;
the data to be labeled are distributed to at least two labeling terminals;
receiving the data labeled by the at least two labeling terminals;
comparing the labeling results of the data labeled by the at least two labeling terminals;
if the labeling results are consistent, storing the labeled data;
and if the labeling results are not consistent, sending the data to be labeled and the labeling results of the at least two labeling terminals to a specified terminal.
In a second aspect of the embodiments of the present invention, there is provided a data annotation implementation apparatus, including:
the acquisition module is used for acquiring data to be marked;
the receiving and sending module is used for distributing the data to be labeled to at least two labeling terminals; and receiving the data labeled by the at least two labeling terminals;
the comparison module is used for comparing the labeling results of the data labeled by the at least two labeling terminals;
the storage module is used for storing the marked data if the marking results are consistent;
and if the labeling results are not consistent, the transceiver module is used for sending the data to be labeled and the labeling results of the at least two labeling terminals to a designated terminal.
In a third aspect of the embodiments of the present invention, there is provided an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data annotation enablement method.
In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium storing a computer program is provided, where the computer program, when executed by a processor, implements the steps of the data annotation implementing method.
As can be seen from the above, in the data annotation implementation method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present invention, data to be annotated is distributed to at least two annotation terminals for annotation, and the annotation results of the at least two annotation terminals are compared, if the results are consistent, the annotated data is stored, and if the results are inconsistent, the data to be annotated and the annotation results of the at least two annotation terminals are sent to a designated terminal, and the designated terminal performs a determination; therefore, on one hand, mutual evidence adjustment is achieved by using the labeling results of the at least two labeling terminals, and on the other hand, when the labeling results of the at least two labeling terminals are inconsistent, the labeling results of the two labeling terminals are finally audited by the appointed terminal, so that the accuracy and the authority of the labeling results are ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description only relate to some embodiments of the present invention and are not limiting on the present invention.
Fig. 1 is a schematic architecture diagram of an embodiment of a data annotation implementation system according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of an embodiment of a data annotation implementation method according to an embodiment of the present invention;
FIG. 3A is a schematic diagram of data uploading at a data embedding point during a test case execution process according to an embodiment of the present invention;
FIG. 3B is a schematic interface diagram of a labeling terminal labeling data to be labeled in the embodiment of the present invention;
fig. 4 is a schematic flowchart of a data annotation implementation method according to another embodiment of the present invention;
fig. 5 is a block diagram illustrating an embodiment of a data annotation implementation apparatus according to an embodiment of the present invention;
fig. 6 is a schematic hardware structure diagram of an embodiment of the apparatus for implementing data annotation according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Also, the use of the terms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
For data annotation of image classes, an open-source tool LabelImg (a visual image calibration tool) exists, which provides data set creation suitable for image detection services, and after the environment is locally installed and started, a corresponding object (object) can be intercepted and annotated on a tool interface and then stored in a locally generated configuration file, wherein the format of the data file is the same as that of the ImageNet format.
Further, there is a tool yolo _ mark (an image data marking software) which is suitable for dataset production for image inspection tasks, can run across platforms and relies on opencv libraries.
The above tools all need a local installation and development environment for a annotator, and under the condition of more data sets, because of local annotation, mutual annotation information cannot be shared, and the annotation efficiency is very low.
Fig. 1 shows an architecture diagram of a data annotation implementation system according to an embodiment of the present invention. As shown in fig. 1, the system may include a data annotation realization apparatus, a first annotation terminal, a second annotation terminal, and a designated terminal. The data annotation realizing device can exchange data with the first annotation terminal, the second annotation terminal and the appointed terminal. The first annotation terminal, the second annotation terminal, and the designated terminal may be, for example, a mobile phone, a tablet computer, a personal computer, a notebook computer, a palm-top computer (PDA), a wearable device (e.g., smart glasses, smart watches), and the like. The data annotation implementation device can be in a server-side form when implemented.
In some scenarios, the data annotation realizing device can realize data exchange with the first annotation terminal, the second annotation terminal and the designated terminal through a network. The network may be a wired network or a wireless network.
In some scenarios, the first annotation terminal, the second annotation terminal, and the designated terminal may be installed with software for performing data interaction with the data annotation implementation apparatus, or the first annotation terminal, the second annotation terminal, and the designated terminal may use a web end to implement data interaction with the data annotation implementation apparatus. Therefore, the first labeling terminal, the second labeling terminal and the designated terminal can receive the data to be labeled from the data labeling implementation device, and the first labeling terminal, the second labeling terminal and the designated terminal can upload the labeled data to the data labeling implementation device.
In addition, in some scenarios, the data annotation implementation device may further send the result of tagging the same data to be tagged by the first tagging terminal and the second tagging terminal to the designated terminal for the designated terminal to review, and the designated terminal may further return a review result and a result of tagging the data to be tagged according to the review result to the data annotation implementation device.
Referring to fig. 1, for example, in the data annotation implementing system according to the embodiment of the present invention, the data annotation implementing apparatus distributes data to be annotated to the first annotation terminal and the second annotation terminal; the first labeling terminal and the second labeling terminal label the data to be labeled respectively and then send the labeled data to the data labeling realization device; the data labeling implementation device compares labeling results of the labeled data, if the labeling results are consistent, the labeled results are stored, if the labeling results are inconsistent, the labeled data and the data to be labeled are sent to the designated terminal, and the designated terminal judges whether the labeling results are accurate and returns correct labeling results (or audit results) and labeled data to the data labeling implementation device.
The data annotation implementation system provided by the embodiment of the invention can solve the problem of inaccurate data annotation information to a certain extent by distributing the data to be annotated to at least two annotation terminals for annotation, then comparing the annotation results of the annotated data, if the annotation results are consistent, indicating that the annotation is accurate and storing the annotated data, and if the annotation results are inconsistent, indicating that the annotation is possibly inaccurate, then sending the data to be annotated and the annotation results of the at least two annotation terminals to a specified terminal, auditing the annotation results by the specified terminal, and re-annotating the data to be annotated or modifying the annotation results of the at least two annotation terminals according to the auditing results.
Fig. 2 is a flowchart illustrating an embodiment of a data annotation implementation method according to an embodiment of the present invention.
As shown in fig. 2, the data annotation implementation method, optionally applied to a server, may include the following steps:
step 11: and acquiring data to be marked.
In this step, the data to be labeled may be data in any form, for example, picture data, voice data, text data, video data, face 106 point data, and the like, and as long as data that can be labeled in the data labeling field can be used as the data to be labeled acquired in this step.
Optionally, the step of obtaining the data to be labeled includes at least one of the following steps:
collecting data to be marked by using a data point burying technology; and
and collecting data to be marked by utilizing a crawler technology.
The data point burying technology is divided into three modes of primary, intermediate and advanced, which are respectively as follows:
primary stage: implanting a statistical code at a product and service conversion key point, and ensuring that data acquisition is not repeated (such as the click rate of a purchase button) according to the independent ID;
and (3) intermediate stage: implanting a plurality of sections of codes, tracking series behaviors of a user on each interface of the platform, wherein events are independent (such as opening a commodity detail page, selecting a commodity model, adding a shopping cart, placing an order and completing purchase);
high-level: and (4) integrating company engineering and ETL (extract transform load) to collect and analyze the full-scale behaviors of the user, establishing a user portrait, and restoring a user behavior model to be used as the basis for product analysis and optimization.
The above data embedding method can be used as a means for data embedding in this step, and is not limited specifically here.
The data point burying technology has multiple modes and methods, for example, points are buried in mobile phone application, and some data information (such as screenshot information and click position information in each operation) is uploaded to a back-end server and then collected by the server; or step-level case data collection is carried out in some automatic test cases and uploaded to a back-end server so as to be collected by the server.
One example is a method for uploading data of an automated test case, and as shown in fig. 3A, in the execution process of the automated test case on a User Interface (UI), data such as a mobile phone page, device information, and operation steps after the completion of each operation step is uploaded to a back-end server, so that collection of a data set is achieved.
A web crawler (also called web spider, web robot) is a program or script that automatically captures web information according to certain rules. The web crawler is a program for automatically extracting web pages, downloads web pages from the world wide web for a search engine, and is an important component of the search engine.
Web crawlers can be broadly classified into the following types according to system structure and implementation technology: general Purpose Web crawlers (General Purpose Web Crawler), Focused Web Crawler (Focused Web Crawler), Incremental Web Crawler (Incremental Web Crawler), Deep Web Crawler (Deep Web Crawler).
In this step, the manner of collecting the data to be labeled by using the crawler technology may be implemented by using any one of the above-mentioned crawler technologies or by using an arrangement and a combination of the crawler technologies, which is not limited herein.
Optionally, the data to be labeled collected by using the crawler technology may be automatically crawled and uploaded to a server by calling some image search engines, such as Application Programming Interfaces (APIs) of hundred-degree images and Google images.
Optionally, the data to be labeled can be stored in a local database after being acquired by using a data embedding technology or a crawler technology, so as to be labeled later. In this case, a relational Mysql database may be used to store the data.
Optionally, in this step, the acquired data may be normalized and then stored. The normalization processing method is not particularly limited, and can be performed by a known normalization method.
Different evaluation indexes often have different dimensions (for example, the dimensions refer to area, room price, floor and the like for evaluating room price, and the dimensions refer to height, weight and the like for predicting the prevalence rate of a certain person) and dimension units (for example, the area units refer to square meters, square centimeters and the like, and the height refers to meters, centimeters and the like), which influence the data analysis result, and in order to eliminate the influence of the dimensions among the indexes, data standardization processing is needed to solve the comparability among the data indexes.
The acquired data are normalized, so that the indexes are in the same order of magnitude, and the method is suitable for comprehensive comparison evaluation, subsequent modeling by using the data and the like.
In one embodiment, the data to be labeled is picture data, and the method is applied to labeling the picture data; the step of normalizing the acquired data may include:
if the acquired data is picture data, converting the picture data into a picture with a predetermined length-width ratio (for example, a rezise processing technique may be adopted), and compressing the picture to a predetermined size (for example, a compression algorithm of opencv may be utilized).
Here, the predetermined length-width ratio and the predetermined size are set as needed, and specific values are not limited herein. In this embodiment, after normalization processing for saving storage space, the data is dropped into the database, and the stored data includes the address of the data to be labeled stored in the server, the length and width information of the picture, the uploaded IP address, the device information, and the like.
Step 12: and distributing the data to be labeled to at least two labeling terminals.
In the step, the data to be marked is distributed to a marking terminal, mainly for the marking terminal to mark the data; and after the data annotation is finished, the annotation terminal returns the data after the data annotation is finished. Here, the data to be labeled may be single data or a data packet composed of multiple data, and a specific distribution manner may be set as required, which is not limited herein.
Optionally, the same data to be labeled needs to be distributed to at least two labeling terminals for labeling, so as to be used for subsequently comparing labeling results; the specific ID distributed to the labeling terminal of the same data to be labeled is not limited, and may be any terminal that can perform data labeling operation; the specific number of the annotation terminals distributed to the same data to be annotated is not limited, and may be two, three, four, or even more.
Step 13: and receiving the data which are labeled by the at least two labeling terminals.
Here, the labeled data may be data with a label or tag, for example, and the label or tag may refer to information such as an attribute and a category of the data to be labeled.
Step 14: and comparing the labeling results of the data labeled by the at least two labeling terminals.
Here, the labeling result is obtained according to a label or a tag carried by the labeled data, and indicates a result of calibrating the information such as the attribute and the category of the data to be labeled during labeling.
Optionally, when the data to be labeled is distributed to the labeling terminal in a form of packaging a plurality of data to be labeled into a data packet, the same labeling terminal may label a plurality of different data to be labeled; therefore, before comparing the labeling results, it is necessary to determine which of the received labeled data are the same data to be labeled, and the determination method may be determined according to a unique identifier (e.g., a file name) of the data to be labeled.
Step 15: if the labeling results are consistent, which indicates that the labeling results are at least known between two labeling persons, the accuracy of the labeling results is approved, and the labeled data (for example, the labeled data is stored in a local database or a folder specially used for storing the labeled data) is stored.
It should be noted that, when there are three or more labeling terminals labeling the same data to be labeled, the consistency of the labeling result here may mean that the labeling results of the labeling terminals are completely consistent, or it may be determined that most of the labeling results are consistent, and the specific setting mode may be selected as needed, which is not limited herein.
Step 16: if the labeling results are inconsistent, the fact that the labeling results are at least different between two labeling persons is indicated, the accuracy of the labeling results is not recognized, the data to be labeled and the labeling results of the at least two labeling terminals are sent to a specified terminal, and the specified terminal judges the data to be labeled and the labeling results of the at least two labeling terminals.
It should be noted that when there are three or more labeling terminals labeling the same data to be labeled, the inconsistency of the labeling results herein may mean that the labeling results of the labeling terminals are different from each other, or that the number of the same labeling results is lower than a certain number threshold, and the specific setting mode may be selected as needed, which is not limited herein.
Optionally, the data annotation implementation method may further include: carrying out authority distribution on the labeling terminal; and the appointed terminal is a labeling terminal with an audit authority.
Here, the system performs authority allocation on each labeling terminal, where only the designated terminal has authority to audit the labeling result, and the designated terminal may also have basic labeling authority, in other words, the designated terminal is a labeling terminal with the authority to audit, as compared with an administrator identity.
As an optional embodiment, after the step 13 of receiving the data that is annotated by the at least two annotation terminals, the method further includes: and if the marked data comprise the preset cleaning mark, deleting the marked data. Here, the predetermined scrub mark indicates that the data is irrelevant data and can be directly deleted without being stored.
Optionally, for the sake of insurance, when the marked data of the two marking terminals include the predetermined cleaning mark, deleting the marked data; and if the labeling completion data of all the labeling terminals do not contain the preset cleaning marks, sending the data to be labeled to the specified terminal and judging the data by the specified terminal, thereby preventing useful data from being deleted by mistake.
As another optional embodiment, after the step 13 of receiving the data that is annotated by the at least two annotation terminals, the method further includes:
if the marked data comprise a preset cleaning mark, sending the marked data comprising the preset cleaning mark to the appointed terminal;
receiving an auditing result of the marked data comprising the preset cleaning mark by the designated terminal;
and determining whether to delete the marked data according to the auditing result of the marked data comprising the preset cleaning mark.
It can be seen that, in this embodiment, the designated terminal finally determines whether to delete the marked data with the predetermined cleaning mark, so as to prevent the marking terminal with the lower-level authority from deleting the useful data by mistake.
As can be seen from the foregoing embodiments, in the data annotation implementation method provided in the embodiments of the present invention, data to be annotated is distributed to at least two annotation terminals for annotation, and annotation results of the at least two annotation terminals are compared, if the results are consistent, the annotated data is stored, and if the results are inconsistent, the data to be annotated and the annotation results of the at least two annotation terminals are sent to a designated terminal, and the designated terminal performs a determination; therefore, on one hand, mutual evidence adjustment is achieved by using the labeling results of the at least two labeling terminals, and on the other hand, when the labeling results of the at least two labeling terminals are inconsistent, the labeling results of the two labeling terminals are finally audited by the appointed terminal, so that the accuracy and the authority of the labeling results are ensured.
As an alternative embodiment, as shown in fig. 2, after the step of sending the data to be labeled and the labeling results of the at least two labeling terminals to a specified terminal, the method further includes:
and step 17: and receiving an auditing result of the designated terminal for the labeling result and data which is labeled on the data to be labeled according to the auditing result.
Optionally, the auditing result may include a label or a tag assigned to the data to be labeled by the designated terminal, and may further include evaluation and analysis contents of the labeling results of the at least two labeling terminals, such as where and why an error occurs in the labeling results of the at least two labeling terminals, and so on; in addition, the auditing result can be displayed in a visual mode for internal reference and improvement.
Step 18: and storing the data which is labeled on the data to be labeled according to the auditing result. In this step, the data marked by the data to be marked by the designated terminal according to the audit result is stored for subsequent use.
Optionally, the data annotation implementation method further includes:
step 19: and returning the auditing result of the designated terminal to the labeling results to the at least two labeling terminals, wherein the labeling terminals can display the auditing result in a visual mode for the annotators of the labeling terminals to refer and learn, and improve the operation on the basis.
As an embodiment, the data to be marked is test case picture data, the method is applied to test case picture data marking, and the data to be marked is acquired by adopting a data buried point technology.
As shown in fig. 3A, in the process of executing the test case, the screenshot and the device information corresponding to each operation step are uploaded in a data point burying manner, that is, for each step of the test case execution flow, the screenshot and the device information corresponding to the step are uploaded in a data point burying manner, so that the collection of the data to be annotated is completed.
As shown in fig. 3B, the interface diagram is an interface diagram for labeling the data to be labeled by the labeling terminal. The data to be labeled is a screenshot of a corresponding step of the test case, a annotator can select a corresponding label or tag, such as a login page or a recharge page, from a drop-down box in the interface to finish one-time labeling of image classification, then click a submit tag button to finish one-time image uploading, thereby finishing the labeling process, and the labeling terminal sends the labeled data to the data labeling implementation device.
Optionally, the data annotation implementation method may be implemented in a web-end manner, so that both the annotation terminal and the designated terminal can enter an annotation page (the page may refer to fig. 3B) through a browser, and begin to annotate after logging in. After the data annotation is realized through the web end, data sharing and simultaneous online annotation of multiple persons can be realized, the data annotation cost and the data set manufacturing cost are reduced, and the annotation efficiency can be improved to a certain extent.
Next, the data already labeled in the foregoing steps can be used for model construction, as an alternative embodiment, as shown in fig. 4, the data labeling implementation method further includes:
step 21: and acquiring the stored marked data.
Step 22: and constructing sample data by using the marked data.
Here, all or part of the labeled data may be selected to construct sample data.
Optionally, before the sample data is constructed, the data can be packaged into a data set format that can be read by the machine learning model.
Step 23: and constructing and training the target model by using the sample data through a preset machine learning algorithm. Alternatively, the predetermined machine learning algorithm may be a deep learning algorithm or the like, and a specific algorithm is not particularly limited.
Therefore, after the data are marked, the data can be used for model construction, and the method is very convenient to use. It should be noted that the target model constructed here is not limited to a certain type, and may be determined according to actual requirements and the nature of the data marked, and is not limited here.
Optionally, as shown in fig. 4, the data annotation implementation method further includes:
step 24: and monitoring the stored marked data according to a preset time interval.
Here, the predetermined time interval is set as needed, and may be 10 minutes, 1 hour, 2 hours, etc., and is not particularly limited herein.
Optionally, the monitoring of the stored labeled data mainly includes detecting incremental data in the labeled data. Optionally, the incremental data herein may refer to newly added annotated data after the last modeling is completed.
Step 25: and if the incremental data in the marked data reach a preset incremental data amount threshold value, constructing new sample data by using the marked data.
When the incremental data reaches a preset incremental data volume threshold value, a new round of training is automatically triggered, firstly, new sample data is constructed by utilizing the existing full-volume data, and all or part of labeled data in the existing full-volume data can be selected to construct the new sample data.
Optionally, before constructing new sample data, the data can be packaged into a data set format that can be read by the machine learning model.
Step 26: and constructing and training by using the new sample data through a preset machine learning algorithm to obtain a new target model.
Therefore, real-time model training is realized by monitoring incremental data at regular time, and the research and development efficiency of deep learning projects is improved.
Fig. 5 is a block diagram illustrating an embodiment of a data annotation implementation apparatus according to an embodiment of the present invention.
As shown in fig. 5, the data annotation implementing apparatus 30 includes:
the acquiring module 31 is used for acquiring data to be marked;
the transceiver module 32 is configured to distribute the data to be labeled to at least two labeling terminals; and receiving the data labeled by the at least two labeling terminals;
the comparison module 33 is configured to compare the labeling results of the data labeled by the at least two labeling terminals;
the storage module 34 is used for storing the marked data if the marking results are consistent;
if the labeling results are not consistent, the transceiver module 32 is configured to send the data to be labeled and the labeling results of the at least two labeling terminals to a designated terminal.
As can be seen from the foregoing embodiments, the data annotation implementing apparatus provided in the embodiments of the present invention distributes data to be annotated to at least two annotation terminals for annotation, and compares the annotation results of the at least two annotation terminals, if the results are consistent, stores the data that is annotated, and if the results are inconsistent, sends the data to be annotated and the annotation results of the at least two annotation terminals to a designated terminal, and the designated terminal performs the determination; therefore, on one hand, mutual evidence adjustment is achieved by using the labeling results of the at least two labeling terminals, and on the other hand, when the labeling results of the at least two labeling terminals are inconsistent, the labeling results of the two labeling terminals are finally audited by the appointed terminal, so that the accuracy and the authority of the labeling results are ensured.
As an optional embodiment, the transceiver module 32 is configured to receive an audit result of the specified terminal on the tagging result and data that is tagged to the data to be tagged according to the audit result;
the storage module 34 is configured to store the data that is labeled on the data to be labeled according to the audit result.
As an optional embodiment, the transceiver module 32 is configured to return an audit result of the specified terminal to the tagging result and data that is tagged to the data to be tagged according to the audit result to the at least two tagging terminals.
As an optional embodiment, the data annotation implementing apparatus 30 further includes an authority distributing module 35, configured to distribute the authority to the annotation terminal; and the appointed terminal is a labeling terminal with an audit authority.
As an optional embodiment, the data annotation implementation apparatus 30 further includes a model building module 36, configured to:
acquiring the stored marked data;
constructing sample data by using the marked data;
and constructing and training the target model by using the sample data through a preset machine learning algorithm.
As an alternative embodiment, the model building module 36 is further configured to:
monitoring the stored marked data according to a preset time interval;
if the incremental data in the marked data reach a preset incremental data amount threshold value, constructing new sample data by using the marked data;
and constructing and training by using the new sample data through a preset machine learning algorithm to obtain a new target model.
As an optional embodiment, the obtaining module 31 is configured to implement at least one of the following steps:
collecting data to be marked by using a data point burying technology; and
and collecting data to be marked by utilizing a crawler technology.
As an optional embodiment, the storage module 34 is configured to perform normalization processing on the acquired data and then store the normalized data.
As an optional embodiment, if the acquired data is picture data, the storage module 34 is configured to convert the picture data into a picture with a predetermined aspect ratio and compress the picture to a predetermined size.
As an optional embodiment, the data annotation implementing device 30 further includes a deleting module 37;
and if the marked data comprises a preset cleaning mark, the deleting module is used for deleting the marked data.
As an optional embodiment, the data annotation implementing device 30 further includes a deleting module 37;
if the data which is marked completely comprises a preset cleaning mark, the transceiver module is used for sending the marked data which comprises the preset cleaning mark to the appointed terminal and receiving an auditing result of the appointed terminal on the marked data which comprises the preset cleaning mark;
and the deleting module is used for determining whether to delete the marked data according to the auditing result of the marked data comprising the preset cleaning mark.
As an optional embodiment, the data to be labeled is picture data, and the device is applied to labeling the picture data.
As an optional embodiment, the data to be labeled is test case picture data, and the device is applied to labeling of the test case picture data.
It should be noted that each embodiment of the data annotation implementing apparatus basically corresponds to the embodiment of the data annotation implementing method, and therefore, the technical effect of the data annotation implementing apparatus is basically the same as that of the data annotation implementing method, and is not described herein again.
Fig. 6 is a hardware structural diagram illustrating an embodiment of an apparatus for performing the data annotation implementation method according to the present invention.
As shown in fig. 6, the apparatus includes:
one or more processors 41 and memory 42, with one processor 41 being an example in fig. 6.
The apparatus for implementing the data annotation method may further include: an input device 43 and an output device 44.
The processor 41, the memory 42, the input device 43 and the output device 44 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The memory 42 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the data annotation implementation method in the embodiment of the present application (for example, the obtaining module 31, the transceiver module 32, the comparing module 33, and the storage module 34 shown in fig. 5). The processor 41 executes various functional applications of the server and data processing, namely, implements the data annotation implementation method of the above-described method embodiment, by executing the nonvolatile software program, instructions and modules stored in the memory 42.
The memory 42 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the data annotation implementing device, and the like. Further, the memory 42 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 42 may optionally include memory located remotely from processor 41, which may be connected to the member user behavior monitoring device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 43 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the data annotation realization device. The output device 44 may include a display device such as a display screen.
The one or more modules are stored in the memory 42 and, when executed by the one or more processors 41, perform the data annotation implementation method of any of the method embodiments described above. The technical effect of the embodiment of the device for executing the data annotation implementation method is the same as or similar to that of any method embodiment.
Embodiments of the present application provide a non-transitory computer storage medium, where a computer-executable instruction is stored, and the computer-executable instruction may execute a processing method for list item operations in any of the above method embodiments. Embodiments of the non-transitory computer storage medium may be the same or similar in technical effect to any of the method embodiments described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program that can be stored in a computer-readable storage medium and that, when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The technical effect of the embodiment of the computer program is the same as or similar to that of any of the method embodiments described above.
Furthermore, the apparatuses, devices, etc. described in the present disclosure may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, etc., and may also be large terminal devices, such as a server, etc., and therefore the scope of protection of the present disclosure should not be limited to a specific type of apparatus, device. The client disclosed by the present disclosure may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method according to the present disclosure may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method of the present disclosure.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions described herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Disclosed exemplary embodiments should be noted, however, that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosure may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a," "an," "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The above-mentioned serial numbers of the embodiments of the present disclosure are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (28)

1. A data annotation implementation method is characterized by comprising the following steps:
acquiring data to be marked;
the data to be labeled are distributed to at least two labeling terminals;
receiving the data labeled by the at least two labeling terminals;
comparing the labeling results of the data labeled by the at least two labeling terminals;
if the labeling results are consistent, storing the labeled data;
and if the labeling results are not consistent, sending the data to be labeled and the labeling results of the at least two labeling terminals to a specified terminal.
2. The method according to claim 1, wherein after the step of sending the data to be labeled and the labeling results of the at least two labeling terminals to a specific terminal, the method further comprises:
receiving an auditing result of the designated terminal for the labeling result and data which is labeled on the data to be labeled according to the auditing result;
and storing the data which is labeled on the data to be labeled according to the auditing result.
3. The method of claim 2, further comprising:
and returning the auditing result of the specified terminal to the labeling result to the at least two labeling terminals.
4. The method of claim 1, further comprising: carrying out authority distribution on the labeling terminal; and the appointed terminal is a labeling terminal with an audit authority.
5. The method of claim 1, further comprising:
acquiring the stored marked data;
constructing sample data by using the marked data;
and constructing and training the target model by using the sample data through a preset machine learning algorithm.
6. The method of claim 5, further comprising:
monitoring the stored marked data according to a preset time interval;
if the incremental data in the marked data reach a preset incremental data amount threshold value, constructing new sample data by using the marked data;
and constructing and training by using the new sample data through a preset machine learning algorithm to obtain a new target model.
7. The method according to claim 1, wherein the obtaining of the data to be labeled comprises at least one of the following steps:
collecting data to be marked by using a data point burying technology; and
and collecting data to be marked by utilizing a crawler technology.
8. The method of claim 7, wherein obtaining data to be labeled comprises:
and carrying out normalization processing on the acquired data and then storing the data.
9. The method of claim 8, wherein normalizing the collected data comprises:
and if the acquired data is picture data, converting the picture data into a picture with a preset length-width ratio, and compressing the picture to a preset size.
10. The method according to claim 1, wherein the step of receiving the annotated data of the at least two annotation terminals is followed by further comprising:
and if the marked data comprise the preset cleaning mark, deleting the marked data.
11. The method according to claim 1, wherein the step of receiving the annotated data of the at least two annotation terminals is followed by further comprising:
if the marked data comprise a preset cleaning mark, sending the marked data comprising the preset cleaning mark to the appointed terminal;
receiving an auditing result of the marked data comprising the preset cleaning mark by the designated terminal;
and determining whether to delete the marked data according to the auditing result of the marked data comprising the preset cleaning mark.
12. The method according to claim 1, wherein the data to be labeled is picture data, and the method is applied to labeling picture data.
13. The method according to claim 12, wherein the data to be labeled is test case picture data, and the method is applied to test case picture data labeling.
14. A data annotation realization device, comprising:
the acquisition module is used for acquiring data to be marked;
the receiving and sending module is used for distributing the data to be labeled to at least two labeling terminals; and receiving the data labeled by the at least two labeling terminals;
the comparison module is used for comparing the labeling results of the data labeled by the at least two labeling terminals;
the storage module is used for storing the marked data if the marking results are consistent;
and if the labeling results are not consistent, the transceiver module is used for sending the data to be labeled and the labeling results of the at least two labeling terminals to a designated terminal.
15. The apparatus according to claim 14, wherein the transceiver module is configured to receive an audit result of the designated terminal on the tagging result and data that is tagged to the data to be tagged according to the audit result;
and the storage module is used for storing the data which is labeled on the data to be labeled according to the auditing result.
16. The apparatus according to claim 15, wherein the transceiver module is configured to return an audit result of the specified terminal on the annotation result to the at least two annotation terminals.
17. The device of claim 14, further comprising a permission assignment module, configured to assign a permission to the annotation terminal; and the appointed terminal is a labeling terminal with an audit authority.
18. The apparatus of claim 14, further comprising a model building module to:
acquiring the stored marked data;
constructing sample data by using the marked data;
and constructing and training the target model by using the sample data through a preset machine learning algorithm.
19. The apparatus of claim 18, wherein the model building module is further configured to:
monitoring the stored marked data according to a preset time interval;
if the incremental data in the marked data reach a preset incremental data amount threshold value, constructing new sample data by using the marked data;
and constructing and training by using the new sample data through a preset machine learning algorithm to obtain a new target model.
20. The apparatus of claim 14, wherein the obtaining module is configured to implement at least one of the following steps:
collecting data to be marked by using a data point burying technology; and
and collecting data to be marked by utilizing a crawler technology.
21. The apparatus of claim 20, wherein the storage module is configured to store the acquired data after performing normalization processing on the acquired data.
22. The apparatus of claim 21, wherein if the acquired data is picture data, the storage module is configured to convert the picture data into a picture with a predetermined aspect ratio and compress the picture to a predetermined size.
23. The apparatus of claim 14, further comprising a deletion module;
and if the marked data comprises a preset cleaning mark, the deleting module is used for deleting the marked data.
24. The apparatus of claim 14, further comprising a deletion module;
if the data which is marked completely comprises a preset cleaning mark, the transceiver module is used for sending the marked data which comprises the preset cleaning mark to the appointed terminal and receiving an auditing result of the appointed terminal on the marked data which comprises the preset cleaning mark;
and the deleting module is used for determining whether to delete the marked data according to the auditing result of the marked data comprising the preset cleaning mark.
25. The apparatus of claim 14, wherein the data to be labeled is picture data, and the apparatus is applied to labeling picture data.
26. The apparatus according to claim 25, wherein the data to be labeled is test case picture data, and the apparatus is applied to label the test case picture data.
27. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 13.
CN201910935375.XA 2019-09-29 2019-09-29 Data annotation implementation method and device, electronic equipment and storage medium Pending CN110750694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910935375.XA CN110750694A (en) 2019-09-29 2019-09-29 Data annotation implementation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910935375.XA CN110750694A (en) 2019-09-29 2019-09-29 Data annotation implementation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110750694A true CN110750694A (en) 2020-02-04

Family

ID=69277452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910935375.XA Pending CN110750694A (en) 2019-09-29 2019-09-29 Data annotation implementation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110750694A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859862A (en) * 2020-07-22 2020-10-30 海尔优家智能科技(北京)有限公司 Text data labeling method and device, storage medium and electronic device
CN112989087A (en) * 2021-01-26 2021-06-18 腾讯科技(深圳)有限公司 Image processing method, device and computer readable storage medium
CN113344083A (en) * 2021-06-16 2021-09-03 安徽容知日新科技股份有限公司 Data labeling method and device and computing equipment
CN113591888A (en) * 2020-04-30 2021-11-02 上海禾赛科技有限公司 Point cloud data labeling network system and method for laser radar
CN113630408A (en) * 2021-08-03 2021-11-09 Oppo广东移动通信有限公司 Data processing method, data processing device, storage medium and server
CN113918713A (en) * 2021-09-22 2022-01-11 南京复保科技有限公司 Data annotation method and device, computer equipment and storage medium
WO2022052199A1 (en) * 2020-09-11 2022-03-17 南方科技大学 Data annotation method, network device, terminal, system and storage medium
CN115795076A (en) * 2023-01-09 2023-03-14 北京阿丘科技有限公司 Cross labeling method, device and equipment for image data and storage medium
CN116189066A (en) * 2021-11-18 2023-05-30 重庆药羚科技有限公司 Laboratory PPE compliance wearing monitoring method and system, storage medium and terminal

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2404040A (en) * 2003-07-16 2005-01-19 Canon Kk Lattice matching
US20080069437A1 (en) * 2006-09-13 2008-03-20 Aurilab, Llc Robust pattern recognition system and method using socratic agents
CN101334814A (en) * 2008-04-28 2008-12-31 华北电力大学 Automatic scanning and reading system and reading method
CN101859338A (en) * 2009-05-14 2010-10-13 深圳市海云天科技股份有限公司 Examination paper reading system and marking implementation method thereof
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN104820835A (en) * 2015-04-29 2015-08-05 岭南师范学院 Automatic examination paper marking method for examination papers
CN105741002A (en) * 2014-12-11 2016-07-06 中兴通讯股份有限公司 Online examination management method, apparatus and system
CN106056134A (en) * 2016-05-20 2016-10-26 重庆大学 Semi-supervised random forests classification method based on Spark
US20160321358A1 (en) * 2015-04-30 2016-11-03 Oracle International Corporation Character-based attribute value extraction system
CN106951925A (en) * 2017-03-27 2017-07-14 成都小多科技有限公司 Data processing method, device, server and system
CN107909114A (en) * 2017-11-30 2018-04-13 深圳地平线机器人科技有限公司 The method and apparatus of the model of training Supervised machine learning
US20190050428A1 (en) * 2017-08-08 2019-02-14 TuSimple System and method for image annotation
CN109359849A (en) * 2018-10-09 2019-02-19 上海起作业信息科技有限公司 Information processing method, device, medium and electronic equipment
CN109447860A (en) * 2018-10-16 2019-03-08 苏州友教习亦教育科技有限公司 Examination result and analysis system
CN109697274A (en) * 2017-10-20 2019-04-30 深圳市鹰硕技术有限公司 One kind sentencing volume method and sentences volume system
CN109784391A (en) * 2019-01-04 2019-05-21 杭州比智科技有限公司 Sample mask method and device based on multi-model
CN109828750A (en) * 2019-01-09 2019-05-31 西藏纳旺网络技术有限公司 Auto-configuration data buries method, apparatus, electronic equipment and storage medium a little
CN109857878A (en) * 2018-12-27 2019-06-07 深兰科技(上海)有限公司 Article mask method and device, electronic equipment and storage medium
CN110147852A (en) * 2019-05-29 2019-08-20 北京达佳互联信息技术有限公司 Method, apparatus, equipment and the storage medium of image recognition

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2404040A (en) * 2003-07-16 2005-01-19 Canon Kk Lattice matching
US20080069437A1 (en) * 2006-09-13 2008-03-20 Aurilab, Llc Robust pattern recognition system and method using socratic agents
CN101334814A (en) * 2008-04-28 2008-12-31 华北电力大学 Automatic scanning and reading system and reading method
CN101859338A (en) * 2009-05-14 2010-10-13 深圳市海云天科技股份有限公司 Examination paper reading system and marking implementation method thereof
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN105741002A (en) * 2014-12-11 2016-07-06 中兴通讯股份有限公司 Online examination management method, apparatus and system
CN104820835A (en) * 2015-04-29 2015-08-05 岭南师范学院 Automatic examination paper marking method for examination papers
US20160321358A1 (en) * 2015-04-30 2016-11-03 Oracle International Corporation Character-based attribute value extraction system
CN106056134A (en) * 2016-05-20 2016-10-26 重庆大学 Semi-supervised random forests classification method based on Spark
CN106951925A (en) * 2017-03-27 2017-07-14 成都小多科技有限公司 Data processing method, device, server and system
US20190050428A1 (en) * 2017-08-08 2019-02-14 TuSimple System and method for image annotation
CN109697274A (en) * 2017-10-20 2019-04-30 深圳市鹰硕技术有限公司 One kind sentencing volume method and sentences volume system
CN107909114A (en) * 2017-11-30 2018-04-13 深圳地平线机器人科技有限公司 The method and apparatus of the model of training Supervised machine learning
CN109359849A (en) * 2018-10-09 2019-02-19 上海起作业信息科技有限公司 Information processing method, device, medium and electronic equipment
CN109447860A (en) * 2018-10-16 2019-03-08 苏州友教习亦教育科技有限公司 Examination result and analysis system
CN109857878A (en) * 2018-12-27 2019-06-07 深兰科技(上海)有限公司 Article mask method and device, electronic equipment and storage medium
CN109784391A (en) * 2019-01-04 2019-05-21 杭州比智科技有限公司 Sample mask method and device based on multi-model
CN109828750A (en) * 2019-01-09 2019-05-31 西藏纳旺网络技术有限公司 Auto-configuration data buries method, apparatus, electronic equipment and storage medium a little
CN110147852A (en) * 2019-05-29 2019-08-20 北京达佳互联信息技术有限公司 Method, apparatus, equipment and the storage medium of image recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NASSER ALALWAN 等: ""Generating OWL Ontology for Database Integration"", 《2009 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING》 *
李明 等: ""基于结果模式的Deep Web数据标注方法"", 《计算机应用》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591888A (en) * 2020-04-30 2021-11-02 上海禾赛科技有限公司 Point cloud data labeling network system and method for laser radar
CN111859862A (en) * 2020-07-22 2020-10-30 海尔优家智能科技(北京)有限公司 Text data labeling method and device, storage medium and electronic device
CN111859862B (en) * 2020-07-22 2024-03-22 海尔优家智能科技(北京)有限公司 Text data labeling method and device, storage medium and electronic device
WO2022052199A1 (en) * 2020-09-11 2022-03-17 南方科技大学 Data annotation method, network device, terminal, system and storage medium
CN112989087A (en) * 2021-01-26 2021-06-18 腾讯科技(深圳)有限公司 Image processing method, device and computer readable storage medium
CN112989087B (en) * 2021-01-26 2023-01-31 腾讯科技(深圳)有限公司 Image processing method, device and computer readable storage medium
CN113344083A (en) * 2021-06-16 2021-09-03 安徽容知日新科技股份有限公司 Data labeling method and device and computing equipment
CN113630408A (en) * 2021-08-03 2021-11-09 Oppo广东移动通信有限公司 Data processing method, data processing device, storage medium and server
CN113630408B (en) * 2021-08-03 2023-06-16 Oppo广东移动通信有限公司 Data processing method, device, storage medium and server
CN113918713A (en) * 2021-09-22 2022-01-11 南京复保科技有限公司 Data annotation method and device, computer equipment and storage medium
CN116189066A (en) * 2021-11-18 2023-05-30 重庆药羚科技有限公司 Laboratory PPE compliance wearing monitoring method and system, storage medium and terminal
CN115795076A (en) * 2023-01-09 2023-03-14 北京阿丘科技有限公司 Cross labeling method, device and equipment for image data and storage medium

Similar Documents

Publication Publication Date Title
CN110750694A (en) Data annotation implementation method and device, electronic equipment and storage medium
US11676223B2 (en) Media management system
CN106844217B (en) Method and device for embedding point of applied control and readable storage medium
WO2020232879A1 (en) Risk conduction association map optimization method and apparatus, computer device and storage medium
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
US9710528B2 (en) System and method for business intelligence data testing
CN107622008B (en) Traversal method and device for application page
US9411917B2 (en) Methods and systems for modeling crowdsourcing platform
Mans et al. Business process mining success
US11004186B2 (en) Parcel change detection
WO2020228283A1 (en) Feature extraction method and apparatus, and computer readable storage medium
CN110674360B (en) Tracing method and system for data
CN109726105A (en) Test data building method, device, equipment and storage medium
CN112711526A (en) UI test method, device, equipment and storage medium
CN112818162A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
TW201843609A (en) System and method for learning-based group tagging
CN113868498A (en) Data storage method, electronic device, device and readable storage medium
CN110879780A (en) Page abnormity detection method and device, electronic equipment and readable storage medium
CN113688288A (en) Data association analysis method and device, computer equipment and storage medium
US20220327452A1 (en) Method for automatically updating unit cost of inspection by using comparison between inspection time and work time of crowdsourcing-based project for generating artificial intelligence training data
CN113448834A (en) Buried point testing method and device, electronic equipment and storage medium
CN113779261A (en) Knowledge graph quality evaluation method and device, computer equipment and storage medium
Zhang et al. Using knowledge-based systems to manage quality attributes in software product lines
CN116501979A (en) Information recommendation method, information recommendation device, computer equipment and computer readable storage medium
CN111522570B (en) Target library updating method and device, electronic equipment and machine-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200204

RJ01 Rejection of invention patent application after publication