CN116467598A - Automatic data labeling method, system, equipment and storage medium - Google Patents

Automatic data labeling method, system, equipment and storage medium Download PDF

Info

Publication number
CN116467598A
CN116467598A CN202310401758.5A CN202310401758A CN116467598A CN 116467598 A CN116467598 A CN 116467598A CN 202310401758 A CN202310401758 A CN 202310401758A CN 116467598 A CN116467598 A CN 116467598A
Authority
CN
China
Prior art keywords
labeling
annotation
automatic
data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310401758.5A
Other languages
Chinese (zh)
Inventor
邹安平
徐春香
余跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202310401758.5A priority Critical patent/CN116467598A/en
Publication of CN116467598A publication Critical patent/CN116467598A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45508Runtime interpretation or emulation, e g. emulator loops, bytecode interpretation
    • G06F9/45512Command shells
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an automatic data labeling method, an automatic data labeling system, automatic data labeling equipment and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: displaying an input selection identifier on a display interface of a labeling main system, then receiving a target data set in response to a first trigger operation aiming at the input selection identifier, acquiring a target labeling task, and then calling a labeling model algorithm module by using a labeling agent module in response to a second trigger operation aiming at a labeling starting identifier to send a preset input address of the target data set to a labeling script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data which is stored in a preset output address; and then, acquiring target marking data stored by a preset output address by using a marking agent module, and displaying the target marking data on a display interface of a marking main system based on a preset display mode. The automatic labeling learning cost is reduced, and meanwhile, the labeling efficiency of the automatic labeling is improved.

Description

Automatic data labeling method, system, equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an automatic data labeling method, an automatic data labeling system, automatic data labeling equipment and a storage medium.
Background
Data is a core element of artificial intelligence, and data annotation is the process of converting raw unstructured primary data, including voice, pictures, text, video, etc., into machine-recognizable information through processing. The labeling process is divided into manual labeling and automatic labeling, and compared with manual labeling, the automatic labeling is applied to an existing automatic labeling model, and labeling results are automatically generated for the original data through the model reasoning process, so that a large amount of manual work can be saved.
In the related art, the automatic labeling model is closely related to the underlying model algorithm, and most of the automatic labeling models are developed for a specific scene, such as a medical image, a face label, a video label, a text label and the like, and the automatic labeling model of the corresponding scene needs to be re-acquired when the scene is changed, or the automatic labeling model is modified, adapted or re-developed. For a user, specific information of models of multiple scenes, such as input and output parameter information, needs to be known, and when a large amount of data of different types need to be marked, the learning cost of automatic marking is high, so that marking efficiency is low.
Disclosure of Invention
The embodiment of the application mainly aims to provide an automatic data labeling method, an automatic data labeling system, automatic data labeling equipment and a storage medium, so that the learning cost of automatic labeling is reduced, and the labeling efficiency is improved.
To achieve the above object, a first aspect of the embodiments of the present application provides an automatic data labeling method, which is applied to an automatic data labeling system, where the automatic data labeling system includes: the labeling system comprises a labeling main system and N labeling agent modules, wherein the labeling agent modules comprise associated labeling model algorithm modules; the N is the number of labeling categories, and the N is an integer greater than or equal to 1; the labeling model algorithm module comprises a labeling script of the automatic labeling model registered in the automatic data labeling system in advance; the method comprises the following steps:
displaying an input selection identifier on a display interface of the labeling main system;
responding to a first triggering operation aiming at the input selection identification, receiving a target data set, and acquiring a target labeling task; the target labeling task is used for selecting the automatic labeling model in the labeling model algorithm module;
responding to a second triggering operation aiming at the marking starting identification, calling the marking model algorithm module by utilizing a marking proxy module, and sending a preset input address of the target data set to the marking script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data, and the target labeling data is stored in a preset output address in a preset output format;
The target annotation data stored in the preset output address are obtained by the annotation agent module, and the target annotation data are sent to the annotation main system;
and displaying on a display interface of the labeling main system based on a preset display mode.
In one embodiment, registering the automatic annotation model in the automatic data annotation system comprises:
displaying a model input identifier on a display interface of the labeling main system, and responding to a third triggering operation aiming at the model input identifier to acquire the automatic labeling model to be registered;
and generating an annotation model algorithm module according to the automatic annotation model, and generating the annotation agent module of the annotation model algorithm module.
In an embodiment, the generating an annotation model algorithm module according to the automatic annotation model and generating the annotation agent module of the annotation model algorithm module includes:
acquiring the labeling category of the automatic labeling model, and a preset output format, a preset input address and a preset output address of the labeling category;
generating the annotation script of the automatic annotation model, wherein the annotation script is used for generating output data of the preset output format according to the input data of the preset input address, and storing the output data in the preset output address;
And generating the annotation model algorithm module according to the annotation script, and configuring the annotation agent module of the annotation model algorithm module according to the parameter information of the annotation model algorithm module.
In an embodiment, the parameter information includes: task interface address, register interface address, output interface address and call command; the labeling agent module for configuring the labeling model algorithm module according to the parameter information of the labeling model algorithm module comprises:
configuring the task interface address of the automatic labeling model in the labeling main system; the task interface address is used for characterizing that the marking main system selects the marking agent module at the task interface address according to the first triggering operation;
configuring the registration interface address of the automatic labeling model in the labeling agent module; the registration interface address is used for representing the address of the labeling script called by the labeling proxy module;
configuring the output interface address of the automatic labeling model in the labeling agent module; the output interface address is used for representing that the marking main system receives output data in the preset output format at the output interface address;
Configuring the preset input address, the preset output format and the preset output address of the automatic labeling model in the labeling agent module;
and configuring a call command of the annotation script in the annotation agent module.
In an embodiment, the parameter information further includes: model information; the labeling agent module for configuring the labeling model algorithm module according to the parameter information of the labeling model algorithm module further comprises:
configuring the model information of the automatic labeling model in the labeling main system; the model information is displayed in the input selection identifier and is used for guiding an operation object to execute the first triggering operation.
In an embodiment, the automatic data labeling system comprises one or more labeling model algorithm modules, each labeling model algorithm module comprises an automatic labeling model and the labeling script of the automatic labeling model, and the labeling script is used for executing labeling tasks of different grading types; the input selection identification includes: the system comprises a data input frame and a task selection frame, wherein the task selection frame comprises a grading type identifier;
The responding to the first triggering operation aiming at the input selection identification receives a target data set and acquires a target labeling task, and the method comprises the following steps:
receiving the target data set in response to a data input operation for the data input box;
selecting the annotation category in response to a task selection operation for a task selection box;
and responding to the grading selection operation aiming at the grading type identification, and selecting the labeling script associated with the grading selection operation in the labeling model algorithm module corresponding to the grading type.
In an embodiment, each of the automatic annotation models in the annotation model algorithm module operates in a separate container or virtual machine; the hierarchical type identification includes: an evaluation identifier and/or a scene identifier; the selecting the annotation script associated with the hierarchical selection operation in the annotation model algorithm module corresponding to the hierarchical type comprises the following steps:
displaying a hierarchical description of each of the automatic annotation models, the hierarchical description comprising: labeling scores and/or scene descriptions; the marking score evaluation score is used for indicating the evaluation identification, and the scene description is used for indicating the scene identification;
Determining the annotation model algorithm module corresponding to the grading type in response to one or more grading selection operations based on the grading description;
one or more annotation scripts associated with the hierarchical selection operation are determined in the annotation model algorithm module.
In an embodiment, the labeling agent module is further configured to determine an execution progress of the labeling script, and send the execution progress to the labeling main system, and display the execution progress on a display interface of the labeling main system.
In an embodiment, the annotation class comprises at least one of: target detection labels, target tracking labels, target segmentation labels or text labels; the selecting the annotation category in response to a task selection operation for a task selection box includes:
displaying a target detection mark identifier, a target tracking mark identifier, a target segmentation mark identifier or a text mark identifier on the task selection frame;
determining the task selection operation in response to a trigger state of at least one of the target detection annotation identifier, the target tracking annotation identifier, the target segmentation annotation identifier or the text annotation identifier;
And determining the annotation category according to the task selection operation.
To achieve the above object, a second aspect of the embodiments of the present application proposes an automatic data labeling system: the automatic data labeling method comprises a labeling main system and N labeling agent modules, wherein the labeling agent modules comprise associated labeling model algorithm modules; the N is the number of labeling categories, and the N is an integer greater than or equal to 1; the labeling model algorithm module comprises a labeling script of an automatic labeling model; the system is configured to perform the automatic data annotation method according to any of the first aspects.
To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, which includes a memory and a processor, the memory storing a computer program, the processor implementing the method according to the first aspect when executing the computer program.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium, storing a computer program, which when executed by a processor implements the method described in the first aspect.
According to the automatic data labeling method, system, equipment and storage medium, an input selection identifier is displayed on a display interface of a labeling main system, then a target data set is received and a target labeling task is acquired in response to a first triggering operation aiming at the input selection identifier, and then a labeling model algorithm module is called by a labeling agent module in response to a second triggering operation aiming at a labeling starting identifier to send a preset input address of the target data set to a labeling script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data which is stored in a preset output address; and then, acquiring target marking data stored by a preset output address by using a marking agent module, and displaying the target marking data on a display interface of a marking main system based on a preset display mode. According to the embodiment of the application, the marking model algorithm module is set for the automatic marking models of different marking types, the marking agent module is utilized to carry out standardized interface transmission on the input and output data of the automatic marking models, and a user can finish automatic marking of the corresponding marking types only by uploading a target data set and selecting operations related to the automatic marking on a display interface of a marking main system and obtain marked result data from the display interface of the marking main system. The user does not need to know the underlying implementation logic of the automatic labeling algorithm, and only needs simple interface interaction operation. Meanwhile, the automatic data labeling system integrates the automatic labeling models corresponding to a plurality of different labeling categories, can meet the use requirement of labeling a large amount of different types of data, and improves the labeling efficiency of automatic labeling while reducing the learning cost of automatic labeling.
Drawings
FIG. 1 is a schematic diagram of an automatic data labeling system according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a structural unit of an automatic data labeling system according to another embodiment of the present invention.
FIG. 3 is a schematic diagram of a display interface of a labeling host system of an automatic data labeling system according to another embodiment of the present invention.
Fig. 4a to fig. 4b are schematic task selection block diagrams of an labeling main system of an automatic data labeling system according to another embodiment of the present invention.
FIG. 5 is a flowchart of registration of an automatic annotation model for an automatic data annotation system according to another embodiment of the present invention.
FIG. 6 is a schematic diagram of model input identifiers of a display interface of a labeling host system of an automatic data labeling system according to yet another embodiment of the present invention.
Fig. 7 is a flowchart of step S420 in fig. 5.
Fig. 8 is a flowchart of step S423 in fig. 7.
Fig. 9 is a flowchart of an automatic data labeling method according to an embodiment of the present invention.
Fig. 10 is a flowchart of step S120 in fig. 9.
Fig. 11 is a flowchart of step S122 in fig. 10.
Fig. 12 is a flowchart of step S123 in fig. 10.
FIG. 13 is a schematic illustration of a labeling script of an automatic data labeling system according to yet another embodiment of the present invention.
FIG. 14 is a schematic diagram of a display interface of an automatic data annotation system according to a further embodiment of the invention.
Fig. 15 is a schematic diagram of a preset display mode of a display interface of an automatic data labeling system according to another embodiment of the present invention.
FIG. 16 is a schematic workflow diagram of an automatic data annotation system according to an embodiment of the present invention.
Fig. 17 is a schematic diagram of data information flow of an automatic data labeling system according to an embodiment of the present invention.
Fig. 18 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
First, several nouns involved in the present invention are parsed:
artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Data is a core element of artificial intelligence, and data annotation is the process of converting raw unstructured primary data, including voice, pictures, text, video, etc., into machine-recognizable information through processing. The labeling process is divided into manual labeling and automatic labeling, and compared with manual labeling, the automatic labeling is applied to an existing automatic labeling model, and labeling results are automatically generated for the original data through the model reasoning process, so that a large amount of manual work can be saved.
Due to the diversity of artificial intelligence application fields and tasks, such as medical imaging, face recognition, target detection, target tracking, target segmentation, etc. In the related art, the automatic labeling model is closely related to the underlying model algorithm, and most of the automatic labeling models are developed aiming at a specific scene, such as a medical image, a face label, a video label, a text label and the like. For different automatic labeling models, the input and output parameters are different, for example, two models of target detection and pedestrian re-identification are adopted, the input of the target detection model is a rectangular frame, the input of the pedestrian re-identification model is a plurality of small pictures and objects to be identified, and the output is the similarity between all the pictures and the target objects. Therefore, the automatic annotation model of the corresponding scene needs to be re-acquired by changing the scene, or the automatic annotation model is modified and adapted or re-developed. For a user, specific information of models of multiple scenes, such as input and output parameter information, needs to be known, and when a large amount of data of different types need to be marked, the learning cost of automatic marking is high, so that marking efficiency is low.
Based on this, the embodiment of the invention provides an automatic data labeling method, system, device and storage medium, which are used for setting a labeling model algorithm module for automatic labeling models of different labeling types, and then carrying out standardized interface transmission on input and output data of the automatic labeling models by using a labeling agent module, wherein a user can finish automatic labeling of the corresponding labeling types by only carrying out target data set uploading and selection operation related to automatic labeling on a display interface of a labeling main system, and obtaining labeling result data from the display interface of the labeling main system. The user does not need to know the underlying implementation logic of the automatic labeling algorithm, and only needs simple interface interaction operation. Meanwhile, the automatic data labeling system integrates the automatic labeling models corresponding to a plurality of different labeling categories, can meet the use requirement of labeling a large amount of different types of data, and improves the labeling efficiency of automatic labeling while reducing the learning cost of automatic labeling.
The embodiment of the invention provides an automatic data labeling method, an automatic data labeling system, automatic data labeling equipment and a storage medium, and particularly relates to the following embodiment.
The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (ArtificialIntelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The embodiment of the invention provides an automatic data labeling method, relates to the technical field of artificial intelligence, and particularly relates to the technical field of data mining. The automatic data labeling method provided by the embodiment of the invention can be applied to a terminal, a server and a computer program running in the terminal or the server. For example, the computer program may be a native program or a software module in an operating system; the Application may be a local (Native) Application (APP), i.e. a program that needs to be installed in an operating system to run, such as a client supporting automatic labeling, or an applet, i.e. a program that only needs to be downloaded into a browser environment to run; but also an applet that can be embedded in any APP. In general, the computer programs described above may be any form of application, module or plug-in. Wherein the terminal communicates with the server through a network. The automatic data labeling method may be performed by a terminal or a server, or by a terminal and a server in cooperation.
In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, or smart watch, or the like. The server can be an independent server, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), basic cloud computing services such as big data and artificial intelligent platforms, and the like; or may be service nodes in a blockchain system, where Peer-To-Peer (P2P, peer To Peer) networks are formed between the service nodes, and the P2P protocol is an application layer protocol that runs on top of a transmission control protocol (TCP, transmission Control Protocol) protocol. The server may be provided with a server of the automatic labeling system, through which interaction with the terminal may be performed, for example, the server may be provided with corresponding software, which may be an application for implementing an automatic data labeling method, etc., but is not limited to the above form. The terminal and the server may be connected by a communication connection manner such as bluetooth, USB (Universal Serial Bus ) or a network, which is not limited herein.
The invention is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In the embodiments of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards of related countries and regions. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the user is explicitly acquired, necessary user related data for enabling the embodiment of the application to normally operate is acquired.
An automatic data annotation system in an embodiment of the present application will be described first. FIG. 1 is a schematic diagram of an automatic data annotation system according to one embodiment.
Referring to FIG. 1, an automatic data annotation system 10 includes: the annotation host system 100 and N annotation agent modules 200, each annotation agent module 200 includes an annotation model algorithm module 300 associated therewith, where N is the number of annotation categories and N is an integer greater than or equal to 1. The labeling agent modules 200 and the labeling model algorithm modules 300 are in one-to-one correspondence, and when one labeling model algorithm module 300 is newly added, one labeling agent module 200 needs to be correspondingly added.
In the above embodiment, the labeling category refers to a scene classification that needs to be automatically labeled, for example, a target detection label, a target tracking label, a target segmentation label, or a text label, where the target detection label includes: 1) The labeling in the image classification scene is to give an image, and the target category contained in the labeling, such as vehicles, novice, street lamps, animals and the like; 2) The annotation in the image positioning scene is given as an image, and the position of the object contained in the annotation in the image. Target tracking annotations are locations of a target in an image given a succession of images and targets. The object segmentation labels are defined as which object or scene a pixel in a given image is determined. Text labels are given text information, and mark emotion information, part-of-speech information, translation results, or the like contained in the text information. It will be appreciated that the description of the labeling categories and scenarios in this embodiment is merely exemplary, and is not representative of the labeling categories in the embodiments of the present application.
In an embodiment, the labeling agent module 200 is connected to the labeling main system 100 and the labeling model algorithm module 300, and is capable of receiving a target labeling task from a user sent by the labeling main system 100, then sending the target labeling task to the labeling model algorithm module 300 associated with the target labeling task for automatic labeling, and then sending a labeling result obtained by automatic labeling to the labeling main system 100 for display.
In one embodiment, the annotation model algorithm module 300 includes an annotation script 320 of an automatic annotation model 310 that is pre-registered with the automatic data annotation system 10. It will be appreciated that an automated data annotation system 10 may comprise one or more annotation model algorithm modules 300, each annotation model algorithm module 300 comprising a corresponding automated annotation model, each automated annotation model having a corresponding annotation script 320. The labeling script 320 is an execution script of the automatic labeling model 310, and the automatic labeling process can be performed by using the automatic labeling model by running the execution script. It will be appreciated that the process of generating the corresponding execution script by the automatic labeling model 310 may refer to the script generation process in the related art, which is not specifically limited in the embodiments of the present application.
FIG. 2 is yet another schematic diagram of an automated data labeling system.
In one embodiment, referring to fig. 2, the labeling main system 100 includes a display interface 110, where interaction with a user is implemented by using the display interface 110, and referring to fig. 3, the display interface 110 includes an input selection identifier 111 and a result display area 112, and the user can operate the input selection identifier 111 through a first trigger operation. The first triggering operation here includes clicking or touching, for example, after clicking, the input selection identifier 111 in the display interface 110 pops up the data input box 113 and the task selection box 114, where the data input box 113 can receive the target data set that the user wants to make automatic labeling, and the task selection box 114 can guide the user to select the automatic labeling model.
In one embodiment, the hierarchical type identifier may be selected in the task selection box 114 via a drop-down button, so as to select a different automatic labeling model, where the hierarchical type identifier includes: an evaluation identification and/or a scene identification.
Referring to fig. 4a and 4b, the hierarchical type identifier in the task selection box 114 is an evaluation identifier, where the evaluation identifier is used to guide the user to select an automatic labeling model with better evaluation. For example, in fig. 4a, the evaluation identifies automatic labeling models for differentiating between labeling scores, which may be obtained by a user scoring the labeling accuracy and labeling efficiency of the automatic labeling models used after a labeling task. For example, in fig. 4a, the score of the label obtained by the statistics of the class a automatic label model is higher than that of the class B automatic label model, and the score of the label obtained by the statistics of the class B automatic label model is higher than that of the class C automatic label model, wherein the class a, the class B or the class C is the hierarchical type identifier. It will be appreciated that if all automatic annotation models are annotated for the first time, the annotation score is an initial value.
Referring to fig. 4b, the hierarchical type identifier in the task selection box 114 is a scene identifier, which is used to distinguish between automatic annotation models of different annotation fields, such as an object detection scene as well, possibly with different annotation fields, such as object detection of medical images, object detection of traffic vehicles, object detection including faces, etc. The automatic labeling models used in different labeling fields have different model parameters, and the automatic labeling models matched with the fields can be selected to improve the labeling accuracy. According to the embodiment, the user is guided to select the refined labeling field by using the grading type identification, and the user experience is improved. For example, in FIG. 4b, the hierarchical type identification includes: medical image detection, traffic vehicle detection, face detection, etc. It will be appreciated that the hierarchical type identifier may be set according to actual requirements.
In one embodiment, referring to FIG. 3, the display interface 110 further includes: the marking progress display field 115, where the marking agent module 200 is further configured to determine an execution progress of the marking script 320, and then send the execution progress to the marking host system 100, and the marking host system 100 displays the execution progress in the marking progress display field 115.
In one embodiment, referring to fig. 2, the labeling host system 100 further includes a user management unit 120 for user identity management. In one embodiment, the user management unit 120 stores the account name, the login password and the usage history of each user, where the usage history is a history of the user that is automatically marked by using the automatic data marking system 10, and the history includes a data set of each use, a marking result, and related information of the used automatic marking model 310. The embodiment of the present application does not limit the specific content of the user management unit 120.
In one embodiment, referring to fig. 2, the labeling main system 100 further includes: the user labeling task receiving unit 130, where the user labeling task receiving unit 130 is an execution unit that is implemented in the display interface 110 and responds to the first trigger event, and is capable of receiving a target labeling task of a user, where the target labeling task includes: target data set, label category selected by user, and the like. Meanwhile, the user labeling task receiving unit 130 is connected with the labeling proxy module 200, and sends the target labeling task received by the user labeling task receiving unit to the labeling proxy module 200.
In one embodiment, referring to fig. 2, the labeling main system 100 further includes: the labeling result receiving unit 140, where the labeling result receiving unit 140 is connected to the labeling proxy module 200, and is configured to receive a labeling result corresponding to the target labeling task returned by the labeling proxy module 200.
In one embodiment, referring to fig. 2, the labeling main system 100 further includes: the registration unit 150, where the registration unit 150 is configured to register the automatic labeling model 310 to generate the labeling model algorithm module 300 and the associated labeling agent module 200 corresponding to the automatic labeling model 310, it is understood that the automatic labeling model 310 may be registered in the automatic data labeling system 10 in advance, and a registration process will be described in detail in the following embodiments.
In one embodiment, referring to fig. 2, the labeling main system 100 further includes: and an execution progress acquiring unit 160, wherein the execution progress acquiring unit 160 is configured to connect with the labeling agent module 200, and receive the execution progress, so as to implement a function of displaying the execution progress in the labeling progress display field 115 in the display interface 110.
In an embodiment, referring to FIG. 2, an example of 2 labeling agent modules 200 is illustrated, and is not representative of limiting the number of labeling agent modules 200.
The labeling proxy module 200 is used for linking with the labeling main system 100 and the labeling model algorithm module 300 to complete the adapting function, and the labeling proxy module 200 includes an algorithm adapter 210, where the algorithm adapter 210 is used to connect with a labeling script 320 in the labeling model algorithm module 300, and the algorithm adapter 210 can call the labeling script 320 to execute an automatic labeling process, specifically, can send a target labeling task to the labeling script 320 associated therewith to perform automatic labeling, and then obtains a labeling result obtained by automatic labeling based on the labeling script 320. It will be appreciated that if multiple annotation scripts 320 are included in the annotation model algorithm module 300, each annotation script 320 corresponds to one of the algorithm adapters 210.
In one embodiment, referring to FIG. 2, the labeling agent module 200 further comprises: the registration model unit 220 is connected to the registration unit 150 in the labeling main system 100 and the automatic labeling model 310 corresponding to the automatic labeling model in the labeling model algorithm module 300, and is configured to obtain parameter information corresponding to the automatic labeling model 310, and send the parameter information to the registration unit 150 in the labeling main system 100, so as to register the automatic labeling model 310 in the labeling main system 100, and a detailed description of the registration process will be provided in the following embodiments.
In one embodiment, referring to FIG. 2, the labeling agent module 200 further comprises: an accept labeling task unit 230 and a return labeling result unit 240. The labeling task accepting unit 230 is connected to the algorithm adapter 210 and the user labeling task receiving unit 130 in the labeling host system 100, and is configured to receive a target labeling task and send the target labeling task to the algorithm adapter 210. The feedback labeling result unit 240 is connected to the algorithm adapter 210 and the labeling result receiving unit 140 in the labeling main system 100, and is configured to send a labeling result corresponding to the target labeling task to the labeling main system 100.
In an embodiment, referring to fig. 2, the labeling model algorithm module 300 further includes an algorithm start script 330 and an input/output catalog unit 340, where the algorithm start script 330 is connected to the algorithm adapter 210 and the annotating script 320, and is used to perform an automatic labeling process by using the labeling script 320 according to a call command sent by the algorithm adapter 210, and the input/output catalog unit 340 is connected to the algorithm adapter 210 on one hand, and is used to receive information about a preset input position for characterizing a data set input by a user in a target labeling task sent by the algorithm adapter 210, and is connected to the automatic labeling model 310 and the feedback labeling result unit 240 on the other hand, and is used to send a preset output position of a labeling result corresponding to the target labeling task output by the automatic labeling model 310 to the labeling proxy module 200, so that the labeling proxy module 200 can obtain the labeling result and send the labeling result to the labeling host system 100. The input/output catalog unit 340 in this embodiment is an inference script for supporting the input and output formats of the labeling agent module 200, and is used for the labeling agent module 200 to call.
As can be seen from the foregoing, the automatic data labeling system according to the embodiment of the present application includes three layers, the first layer is the labeling main system 100, the second layer is the labeling agent module 200, and the third layer is the labeling model algorithm module 300. Specifically, the annotation host system 100 provides functionality for an interoperable automatic annotation portal that enables automatic annotation only after access to the annotation agent module 200.
The labeling agent module 200 is used as an adaptation layer, and when in communication with the labeling main system 100 and the labeling model algorithm module 300, receives a target labeling task transmitted by the labeling main system 100, acquires input related data from the labeling main system 100, invokes an inference script of an automatic labeling model provided by the labeling model algorithm module 300 to perform automatic labeling, and returns a result of reading the automatic labeling according to an output data format of the automatic labeling model to the labeling main system 100 after the labeling is completed. The physical isolation or virtual isolation between the labeling agent module 200 and the labeling main system 100 can be realized by using standardized Restful interface communication, and the communication requirement only needs to be smooth, so that the requirement of labeling by accessing an inference interface of an automatic labeling model which is already trained in advance can be met. Meanwhile, the standardized interface provided by the labeling agent module 200 is adapted to the input parameter format and the output parameter format of different types of automatic labeling model algorithms, so that the difference of different automatic labeling model algorithms can be avoided, and any automatic labeling model can be connected to the automatic data labeling system.
Different automatic labeling algorithms can be accessed in the labeling model algorithm module 300 according to application requirements, so that the automatic data labeling system can support the labeling functions of multiple scenes such as target detection, target segmentation, text labeling, oversized image (medical image) labeling, video labeling, pedestrian re-recognition and the like. It can be appreciated that, because the environments required by different automatic labeling models during operation reasoning may be different, such as a PyTorch version, a CUDA version, etc., it is difficult to support the operation of multiple automatic labeling models in a separate system, so in this embodiment, each automatic labeling algorithm in the labeling injection molding algorithm module 300 runs in a separate Docker container or virtual machine, for example, an algorithm of the automatic labeling model is installed in each Docker container, so that the algorithms of the automatic labeling models can be ensured to be independent from each other, the problem of diversification of model environments is solved, multiple automatic labeling tasks can be simultaneously executed in some scenarios, and different automatic labeling models can be processed in parallel.
According to the description of the layering mechanism in the embodiment, the automatic data labeling system in the embodiment of the application reduces the coupling between the labeling main system and the labeling model algorithm module and decouples the labeling main system from the labeling model algorithm module through the layering mechanism, so that the embodiment of the application can dynamically expand and access different automatic labeling models to carry out automatic labeling. If an automatic labeling model which is not supported by the labeling main system is encountered, only the labeling proxy module 200 needs to be added to newly add a call to the automatic labeling model, and the labeling main system 100 does not need to be modified. Meanwhile, only the communication between the marking agent module 200 and the marking main system 100 is required to be exposed, the marking agent module 200 performs data transmission according to the input and output data formats required by different automatic marking algorithms, the marking model algorithm module 300 does not directly communicate with the marking main system 100, the risk of the exposure algorithm implementation can be reduced for the marking model algorithm module 300, the marking main system 100 does not need to sense the difference of each automatic marking model, and actions such as pulling a data set are not required, and the like, so that the expansion and adaptation of the automatic marking model are further realized, and the application scene of the automatic data marking system is expanded.
The following describes a process of registering an automatic annotation model in an automatic data annotation system in an embodiment of the present application. In one embodiment, referring to FIG. 5, the process of registering an automatic annotation model in an automatic data annotation system comprises the steps of:
step S410: and displaying the model input identifier on a display interface of the labeling main system, and responding to a third triggering operation aiming at the model input identifier to acquire the automatic labeling model to be registered.
In one embodiment, referring to FIG. 6, the model input identifier 116 is displayed on the display interface of the annotation host system 100, and the model input identifier 116 is used to guide the user to upload the pre-trained automatic annotation model to the automatic data annotation system to construct the corresponding annotation model algorithm 300 and annotation agent 200.
Meanwhile, after receiving a third triggering operation of a user for inputting an identifier to the model, responding to the third triggering operation, and acquiring the automatic labeling model to be registered. The third triggering operation here may be a click or touch operation. For example, after the model is clicked to input the identification, an input box is popped up to guide the user to input an uploading path, wherein the uploading path is the storage address of the automatic labeling model to be registered.
Step S420: and generating an annotation model algorithm module according to the automatic annotation model, and generating an annotation agent module of the annotation model algorithm module.
In one embodiment, referring to fig. 7, step S420 specifically includes the following steps:
step S421: and obtaining the labeling category of the automatic labeling model, and a preset output format, a preset input address and a preset output address of the labeling category.
In an embodiment, the automatic labeling models corresponding to different labeling types have different input and output data formats, for example, two models of target detection and pedestrian re-recognition are different, the input of the target detection model is a rectangular frame, the input of the pedestrian re-recognition model is a plurality of small pictures and objects to be recognized, and the output is the similarity between all the pictures and the target objects. The present embodiment defines different data formats for each annotation class, while using different numbers to represent the annotation class. For example, label category 1: representing target detection labels, in particular to automatic labels of pictures; 2 represents target detection labeling, in particular DCM data automatic labeling; 3 represents target detection annotation, in particular to automatic annotation of oversized images; 4 represents target detection annotation, in particular video automatic annotation; 5 represents target tracking annotation, in particular to single target tracking automatic annotation; 6 represents target tracking annotation, in particular to multi-target tracking automatic annotation; 7, text labels, in particular text entity automatic labels; 8 represents a target division mark, etc. For the output data format, the output formats of the labeling results of 1, 2, 3 and 4 in the labeling types are consistent, and the output formats can be text description or similarity value of the target detection result; 5. 6, the output format of the labeling result is consistent, and the labeling result can be a rectangular frame and the like; 7, outputting the labeling result in a text classification result and the like; 8, the output format of the labeling result is a similarity value and the like. It will be appreciated that the above description of annotation class and data format definitions for different automatic annotation models is illustrative only and not limiting.
In the above embodiment, for the automatic labeling model to be registered, the labeling category needs to be selected, and the preset output format, the preset input address and the preset output address corresponding to the labeling category. Wherein the preset output format includes: the target detection result, coordinate information of a rectangular frame and the like, the preset input address refers to a data set storage address of a read data set, the preset output address refers to a storage address of marked data obtained by the automatic marking model, namely the automatic marking model reads the data set at the preset input address to automatically mark, then the marked data after automatic marking is stored in the preset output address, and the storage data format of the marked data is determined according to the preset output format.
Step S422: and generating an annotation script of the automatic annotation model.
In an embodiment, the automatic labeling model is converted into a labeling script that can be used for execution, and the labeling script can generate output data in a preset output format according to input data stored in a preset input address, and store the output data in the preset output address.
Step S423: and generating an annotation model algorithm module according to the annotation script, and configuring an annotation agent module of the annotation model algorithm module according to the parameter information of the annotation model algorithm module.
In an embodiment, the labeling script generated according to the above steps constructs a labeling model algorithm module, specifically, configuration parameter information. In one embodiment, the parameter information includes: the steps of configuring the labeling agent module of the labeling model algorithm module according to the parameter information of the labeling model algorithm module include, with reference to fig. 8:
step S4231: and configuring model information of the automatic labeling model in a labeling main system.
In one embodiment, the model information is represented as model_desc, and the model information is displayed in an input selection identifier marking the main system display interface and is used for guiding the operation object to execute the first triggering operation. For example, in the selection of the automatic annotation model of task selection box 114, referring to FIG. 4a, when the user's mouse cursor is moved over the hierarchical type identifier, model information is displayed adjacent to the cursor. For example, in FIG. 4a, when the cursor is moved to the "level A" hierarchical type identifier, model information is displayed: model name: m1, model score: 98 minutes, model registration time: xx year xx month xx day). It will be appreciated that the model information in this embodiment is merely illustrative. The user can select an automatic labeling model which is more suitable for the application scene through the model information.
Step S4232: and configuring a task interface address of the automatic labeling model in the labeling agent module.
In one embodiment, the task interface address is expressed as: the task interface address is a parameter agreed by the marking main system and the automatic data marking system, and when the marking main system responds to the first trigger operation, the marking agent module corresponding to the target marking task can be called according to the agreed task interface address. Wherein the registration unit 150 performs this step.
Step S4233: and configuring a registration interface address of the automatic labeling model in the labeling agent module.
In one embodiment, the registration interface address is expressed as: the registration interface address is a parameter agreed by the annotation agent module and the automatic data annotation system, and is an address of an annotation script corresponding to the annotation task of the annotation agent module according to the agreed registration interface address call selection target. Wherein the registration model unit 220 performs this step.
Step S4234: and configuring an output interface address of the automatic labeling model in the labeling agent module.
In an embodiment, the output interface address is expressed as a label_system_model_receiver_msg, and is a parameter agreed by the labeling main system and the labeling agent module, and after the automatic labeling of the automatic labeling model is finished, the output data in a preset output format is stored in the output interface address according to the agreed output interface address, so that the labeling main system can acquire the output data at the address conveniently. Wherein the registration model unit 220 performs this step.
Step S4235: and configuring a preset input address, a preset output format and a preset output address of the automatic labeling model in the labeling agent module.
In an embodiment, the preset input address, preset output format, and preset output address of different automatic labeling models may be different.
Step S4236: and configuring a call command of the annotation script in the annotation proxy module.
In an embodiment, after the interface address configuration of the above steps is completed, the labeling proxy module needs to be configured to call a call command of the labeling script, and through the call command, the labeling proxy module can call the labeling script to execute the automatic labeling task. Wherein the registration model unit 220 performs this step.
It can be understood that after the configuration process is completed, the labeling agent module is started to complete the process of registering the automatic labeling model to the automatic data labeling system.
In an embodiment, the automatic data labeling system may be a client running on a terminal, where the front end of the client is a display interface of a labeling main system, and the rear end integrates the functions of a labeling agent module and a labeling model algorithm module. Or the labeling main system of the automatic labeling system is a client, and the functions of the labeling agent module and the labeling model algorithm module are realized in a server. The implementation manner of the automatic data labeling system is not particularly limited in this embodiment.
The structure diagram of the automatic data marking system in the embodiment of the application is described above, and the automatic data marking method applied to the automatic data marking system in the embodiment of the invention is described below.
Fig. 9 is an alternative flowchart of an automatic data labeling method according to an embodiment of the present invention, where the method in fig. 9 may include, but is not limited to, steps S110 to S150. It should be understood that the order of steps S110 to S150 in fig. 9 is not particularly limited, and the order of steps may be adjusted, or some steps may be reduced or increased according to actual requirements.
Step S110: and displaying the input selection identification on a display interface of the labeling main system.
Step S120: and responding to a first triggering operation aiming at the input selection identification, receiving a target data set and acquiring a target labeling task.
In one embodiment, a user generates a target labeling task through interaction with an input selection identifier, and the target labeling task selects an automatic labeling model in a labeling model algorithm module according to the automatic labeling requirement of the user to perform automatic labeling. Referring to fig. 10, step S120 includes the steps of:
step S121: a target data set is received in response to a data input operation for the data input box.
In one embodiment, the user performs a data input operation by clicking or touching the data input box, selects a target data set from the uploading address for uploading, and the labeling host system stores the target data set in the preset input address. The target data set is a data set which needs to be automatically marked. Such as a photo set or a text set, etc. The uploading address of the target data set is different from the preset input address stored in the labeling main system.
Step S122: in response to a task selection operation for the task selection box, a annotation class is selected.
In one embodiment, the user performs a task selection operation by clicking or touching a task selection box in which the type of annotation, e.g., object detection annotation, object tracking annotation, object segmentation annotation, text annotation, etc., is selected. Referring to fig. 11, step S122 includes the steps of:
step S1221: and displaying the target detection mark identifier, the target tracking mark identifier, the target segmentation mark identifier or the text mark identifier in the task selection box.
Step S1222: and determining task selection operation in response to a trigger state of at least one of the target detection annotation identifier, the target tracking annotation identifier, the target segmentation annotation identifier or the text annotation identifier.
Step S1223: and determining the annotation category according to the task selection operation.
In an embodiment, after clicking the task selection box, a next level menu is popped up, and the target detection label, the target tracking label, the target segmentation label or the text label are displayed in the task selection box in the form of different icons. The user selects the corresponding target detection mark, target tracking mark, target segmentation mark or text mark to form a trigger state according to the mark requirement, so as to determine task selection operation, and then determines mark category according to the task selection operation. It can be appreciated that the labeling categories herein can be selected multiple times according to the labeling requirements of the user, so as to realize parallel labeling of different labeling categories. For example, the target detection annotation and the target segmentation annotation are carried out on the same target data set, and when multiple selections are carried out, the automatic data annotation system calls different annotation scripts corresponding to the target detection annotation and the target segmentation annotation in parallel.
Step S123: and responding to the grading selection operation aiming at the grading type identification, and selecting the labeling script associated with the grading selection operation in the labeling model algorithm module corresponding to the grading type.
In one embodiment, one or more different hierarchical types are also included under one annotation category. The classification type is used for further helping the user to refine the labeling requirement, and the classification type identification comprises: an evaluation identification and/or a scene identification. Referring to fig. 12, selecting the annotation script associated with the hierarchical selection operation in the annotation model algorithm module corresponding to the hierarchical type in step S123 includes the steps of:
step S1231: a hierarchical description of each automatic annotation model is displayed.
In one embodiment, the hierarchical description includes: annotation score and/or scene description referring to fig. 4a, the annotation score is used to describe scoring information of an automatic annotation model, and the annotation score or scene description is displayed when a user's cursor moves to the corresponding automatic annotation model. Examples of labeled scores, such as in fig. 4a, are: "98 points, high score", referring to FIG. 4b, the scene description is used to describe the labeling field of the automatic labeling model, e.g., the scene description example in FIG. 4b is: "for detecting faces in an image dataset". It is to be understood that the hierarchical descriptions in the embodiments of the present application are merely examples and are not intended to be limiting.
Step S1232: in response to one or more hierarchical selection operations based on the hierarchical description, a labeling model algorithm module corresponding to the hierarchical type is determined.
Step S1233: one or more annotation scripts associated with the hierarchical selection operation are determined in an annotation model algorithm module.
In an embodiment, in the hierarchical selection operation, the user may also select multiple hierarchical type identifiers according to the labeling requirements, for example, in a large scene of target detection labeling, and simultaneously select traffic vehicle detection and face detection, where the automatic data labeling system calls different labeling scripts for traffic vehicle detection and face detection in parallel.
In one embodiment, referring to fig. 13, each annotation script 320 corresponds to one annotation model algorithm 300, and each annotation model algorithm 300 is connected to the annotation host system 100 via its corresponding annotation agent 200. As can be seen in the figure, the labeling categories include: target detection annotation, target tracking annotation, target segmentation annotation and text annotation, wherein the target detection annotation comprises: medical image detection annotation script, traffic vehicle detection annotation script and face detection annotation script, the target tracking annotation contains: a-level target tracking and labeling script, B-level target tracking and labeling script and C-level target tracking and labeling script.
It can be understood that the marking main system can also judge whether the marking agent module works normally or not by monitoring the heartbeat of the marking agent module, and if not, the executing process of the marking agent module is removed. Meanwhile, when a plurality of marking agent modules are dynamically deployed, each marking agent module can be shut down at any time according to the needs.
The target data set can be received through the selection process, and a target labeling task is obtained, where the target labeling task can determine which labeling script corresponding to the labeling proxy module 200 is invoked.
Step S130: responding to a second triggering operation aiming at the marking starting identification, calling a marking model algorithm module by using a marking agent module, and sending a preset input address of the target data set to a marking script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data, and the target labeling data is stored in a preset output address in a preset output format.
In an embodiment, referring to fig. 14, a start identifier "start marking" is further provided on the display interface, and when the user decides to start the marking process, a second trigger operation may be performed by clicking or touching the start identifier to start the marking task. When the starting mark is triggered, the marking agent module is used for calling the marking model algorithm module to send the preset input address of the target data set to the marking script, the preset input address is the storage address of the target data set, the marking script uses the automatic marking model to read the target data set from the preset input address, then the marking process is executed, and therefore target marking data are generated and stored in the preset output address in the preset output format. It can be appreciated that the preset output format is specifically set according to the automatic labeling model.
Step S140: and acquiring target marking data stored in a preset output address by using the marking agent module, and transmitting the target marking data to a marking main system.
In one embodiment, when the labeling is finished, the labeling proxy module reads the target labeling data at the preset output address and sends the target labeling data to the labeling host system.
In an embodiment, referring to fig. 14, the labeling agent module is further configured to determine an execution progress of the labeling script, and send the execution progress to the labeling host system, and display the execution progress on a labeling progress display column 115 of a display interface of the labeling host system.
Step S150: and displaying on a display interface of the labeling main system based on a preset display mode.
In one embodiment, referring to fig. 15, the preset display mode includes: all displays and address displays refer to displaying the target annotation data in the result display area 112, specifically, all displays refer to displaying the annotation result of each element in the target dataset in real time on the display interface of the annotation host system, while displaying the execution progress in the annotation progress display field 115. For example, 10000 images in the figure need to be labeled, and the labeling is displayed in the result display area 112: "image M/noted", while displaying the current progress in the noted progress display field 115 as: m/10000. The address display refers to displaying a storage address of target annotation data on a display interface of the annotation host system, and adding a jump annotation capable of being linked to the storage address, for example, the storage address shown in the figure is: "E \label \output \1 …", skip label such as "skip view" in the figure.
As can be seen from the foregoing, in the automatic data labeling method according to the embodiment of the present application, an input selection identifier is displayed on a display interface of a labeling main system, then a target data set is received and a target labeling task is obtained in response to a first triggering operation for the input selection identifier, and then a labeling model algorithm module is called by using a labeling proxy module in response to a second triggering operation for a labeling starting identifier, so that a preset input address of the target data set is sent to a labeling script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data which is stored in a preset output address; and then, acquiring target marking data stored by a preset output address by using a marking agent module, and displaying the target marking data on a display interface of a marking main system based on a preset display mode.
In one embodiment, with reference to FIG. 16, the workflow of an automatic data annotation system is described.
Firstly, starting a labeling main system, then, carrying out script generation on a pre-trained automatic labeling model to obtain a labeling script, wherein the labeling script can realize that output data in a preset output format can be generated according to input data of a preset input address, and the output data is stored in the preset output address.
Then registering and accessing the automatic labeling model in the automatic data labeling system, wherein the registering process is described as follows: generating an annotation agent module, associating the annotation agent module with the running environment of the automatic annotation model, and ensuring that the annotation agent module can be called into the algorithm environment of the automatic annotation model. And then, parameter information, such as model information, of the automatic labeling model is configured in the labeling agent module, and the model information is displayed in an input selection identifier of a display interface of the labeling main system after configuration and is used for guiding the operation object to execute a first triggering operation. Then, a task interface address of the automatic labeling model is configured in the labeling agent module, a registration interface address of the automatic labeling model is configured in the labeling agent module, and an output interface address of the automatic labeling model is configured in the labeling agent module. Then configuring a preset input address, a preset output format and a preset output address of the automatic labeling model, then configuring a root directory where the automatic labeling model is located, and configuring a call command of a labeling script in a labeling proxy module, wherein the labeling proxy module can call the labeling script to execute an automatic labeling task through the call command, and store a labeling result at the preset output address according to the preset output format. After the related information of the automatic labeling model is configured, the object storage system information is configured and used for receiving the target data set read during the automatic labeling task. And finally, starting the labeling agent module, wherein the labeling agent module registers all information of the automatic labeling model into a labeling main system.
And for the labeling main system, receiving relevant registration model request information sent by the labeling agent module, and storing the relevant registration model request information into a database. If a first trigger operation of a user is received, a target labeling task is received, the target labeling task is initialized, the obtained target data set is organized and stored in an object storage, task interface addresses agent_receiver_task_url of different automatic labeling models are obtained from a database, the target labeling task and a preset input address of the input target data set serve as parameters to be transmitted into a labeling agent module for automatic labeling, and the labeling agent module waits for returning target labeling data of the target labeling task.
For the labeling agent module, after receiving a target labeling task, a preset input address of a target data set is obtained from input parameters, the address is sent to an input/output catalog unit of the labeling model algorithm module, so that the labeling model algorithm module can download the target data set, and then an algorithm adapter of the labeling agent module calls an algorithm starting script of the labeling model algorithm module, and calls the labeling script to complete an automatic labeling task to obtain target labeling data. And the labeling agent module transmits the target labeling data back to the labeling main system, and the labeling main system displays the target labeling data.
In one embodiment, FIG. 17 is a schematic diagram of the flow of data information between the different modules of an automatic data annotation system.
In step S1700, the labeling agent module is registered in the labeling host system.
Step S1710; the labeling main system acquires a target labeling task of a user, stores the target labeling task in a MySQL database, and stores a target data set in an object storage.
Step S1720: and the marking main system sends the target marking task and the preset input address of the target data set to the marking agent module.
Step S1730: the labeling agent module downloads the target data set from a preset input address.
Step S1740: the annotation agent module invokes the annotation model algorithm module.
Step S1750: the labeling model algorithm module reads a target data set which needs to be automatically labeled by using the input catalog unit.
Step S1760: after the labeling is completed, the labeling model algorithm module stores the target labeling data into a preset output position by utilizing the output catalog unit.
Step S1770: the labeling agent module obtains target labeling data from the input/output catalog unit.
Step S1780: the labeling agent module transmits the target labeling data back to the labeling main system.
Step S1790: the annotation host system displays the target annotation data.
According to the embodiment of the application, the marking model algorithm module is set for the automatic marking models of different marking types, the marking agent module is utilized to carry out standardized interface transmission on the input and output data of the automatic marking models, and a user can finish automatic marking of the corresponding marking types only by uploading a target data set and selecting operations related to the automatic marking on a display interface of a marking main system and obtain marked result data from the display interface of the marking main system. The user does not need to know the underlying implementation logic of the automatic labeling algorithm, and only needs simple interface interaction operation. Meanwhile, the automatic data labeling system integrates the automatic labeling models corresponding to a plurality of different labeling categories, can meet the use requirement of labeling a large amount of different types of data, and improves the labeling efficiency of automatic labeling while reducing the learning cost of automatic labeling.
The embodiment of the invention also provides electronic equipment, which comprises:
at least one memory;
at least one processor;
at least one program;
the program is stored in the memory, and the processor executes the at least one program to implement the automatic data labeling method described above. The electronic equipment can be any intelligent terminal including a mobile phone, a tablet personal computer, a personal digital assistant (Personal Digital Assistant, PDA for short), a vehicle-mounted computer and the like.
Referring to fig. 18, fig. 18 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
the processor 1801 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs, so as to implement the technical solution provided by the embodiments of the present invention;
the memory 1802 may be implemented in the form of a ROM (read only memory), a static storage device, a dynamic storage device, a RAM (random access memory), or the like. The memory 1802 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory 1802, and the processor 1801 invokes an automatic data labeling method for executing the embodiments of the present disclosure;
an input/output interface 1803 for implementing information input and output;
the communication interface 1804 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.); and
A bus 1805 for transferring information between components of the device (e.g., processor 1801, memory 1802, input/output interfaces 1803, and communication interfaces 1804);
wherein the processor 1801, memory 1802, input/output interface 1803, and communication interface 1804 enable communication connection among each other within the device via bus 1805.
The embodiment of the application also provides a storage medium, which is a computer readable storage medium, and the storage medium stores a computer program, and the computer program realizes the automatic data labeling method when being executed by a processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
According to the automatic data labeling method, the automatic labeling device, the electronic equipment and the storage medium, input selection identifiers are displayed on a display interface of a labeling main system, then a target data set is received and a target labeling task is acquired in response to a first triggering operation aiming at the input selection identifiers, and then a labeling model algorithm module is called by a labeling agent module in response to a second triggering operation aiming at labeling starting identifiers to send preset input addresses of the target data set to a labeling script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data which is stored in a preset output address; and then, acquiring target marking data stored by a preset output address by using a marking agent module, and displaying the target marking data on a display interface of a marking main system based on a preset display mode. According to the embodiment of the application, the marking model algorithm module is set for the automatic marking models of different marking types, the marking agent module is utilized to carry out standardized interface transmission on the input and output data of the automatic marking models, and a user can finish automatic marking of the corresponding marking types only by uploading a target data set and selecting operations related to the automatic marking on a display interface of a marking main system and obtain marked result data from the display interface of the marking main system. The user does not need to know the underlying implementation logic of the automatic labeling algorithm, and only needs simple interface interaction operation. Meanwhile, the automatic data labeling system integrates the automatic labeling models corresponding to a plurality of different labeling categories, can meet the use requirement of labeling a large amount of different types of data, and improves the labeling efficiency of automatic labeling while reducing the learning cost of automatic labeling.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. It should be understood that the display interfaces and interactions described in the embodiments and figures of the present application are exemplary descriptions and are not meant to be limiting.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (12)

1. An automatic data labeling method, which is characterized by being applied to an automatic data labeling system, wherein the automatic data labeling system comprises: the labeling system comprises a labeling main system and N labeling agent modules, wherein the labeling agent modules comprise associated labeling model algorithm modules; the N is the number of labeling categories, and the N is an integer greater than or equal to 1; the labeling model algorithm module comprises a labeling script of the automatic labeling model registered in the automatic data labeling system in advance; the method comprises the following steps:
displaying an input selection identifier on a display interface of the labeling main system;
responding to a first triggering operation aiming at the input selection identification, receiving a target data set, and acquiring a target labeling task; the target labeling task is used for selecting the automatic labeling model in the labeling model algorithm module;
responding to a second triggering operation aiming at the marking starting identification, calling the marking model algorithm module by utilizing a marking proxy module, and sending a preset input address of the target data set to the marking script; the labeling script labels the target data set by using the automatic labeling model to generate target labeling data, and the target labeling data is stored in a preset output address in a preset output format;
The target annotation data stored in the preset output address are obtained by the annotation agent module, and the target annotation data are sent to the annotation main system;
and displaying on a display interface of the labeling main system based on a preset display mode.
2. The automated data labeling method of claim 1, wherein registering the automated labeling model with an automated data labeling system comprises:
displaying a model input identifier on a display interface of the labeling main system, and responding to a third triggering operation aiming at the model input identifier to acquire the automatic labeling model to be registered;
and generating an annotation model algorithm module according to the automatic annotation model, and generating the annotation agent module of the annotation model algorithm module.
3. The method of automatic data annotation according to claim 2, wherein the generating an annotation model algorithm module from the automatic annotation model and generating the annotation agent module of the annotation model algorithm module comprises:
acquiring the labeling category of the automatic labeling model, and a preset output format, a preset input address and a preset output address of the labeling category;
Generating the annotation script of the automatic annotation model, wherein the annotation script is used for generating output data of the preset output format according to the input data of the preset input address, and storing the output data in the preset output address;
and generating the annotation model algorithm module according to the annotation script, and configuring the annotation agent module of the annotation model algorithm module according to the parameter information of the annotation model algorithm module.
4. The automatic data annotation method as claimed in claim 3, wherein the parameter information comprises: task interface address, register interface address, output interface address and call command; the labeling agent module for configuring the labeling model algorithm module according to the parameter information of the labeling model algorithm module comprises:
configuring the task interface address of the automatic labeling model in the labeling main system; the task interface address is used for characterizing that the marking main system selects the marking agent module at the task interface address according to the first triggering operation;
configuring the registration interface address of the automatic labeling model in the labeling agent module; the registration interface address is used for representing the address of the labeling script called by the labeling proxy module;
Configuring the output interface address of the automatic labeling model in the labeling agent module; the output interface address is used for representing that the marking main system receives output data in the preset output format at the output interface address;
configuring the preset input address, the preset output format and the preset output address of the automatic labeling model in the labeling agent module;
and configuring a call command of the annotation script in the annotation agent module.
5. The automatic data annotation method as claimed in claim 4, wherein the parameter information further comprises: model information; the labeling agent module for configuring the labeling model algorithm module according to the parameter information of the labeling model algorithm module further comprises:
configuring the model information of the automatic labeling model in the labeling main system; the model information is displayed in the input selection identifier and is used for guiding an operation object to execute the first triggering operation.
6. The automatic data labeling method according to claim 1, wherein the automatic data labeling system comprises one or more labeling model algorithm modules, each labeling model algorithm module comprises an automatic labeling model and the labeling script of the automatic labeling model, and the labeling script is used for executing labeling tasks of different grading types; the input selection identification includes: the system comprises a data input frame and a task selection frame, wherein the task selection frame comprises a grading type identifier;
The responding to the first triggering operation aiming at the input selection identification receives a target data set and acquires a target labeling task, and the method comprises the following steps:
receiving the target data set in response to a data input operation for the data input box;
selecting the annotation category in response to a task selection operation for a task selection box;
and responding to the grading selection operation aiming at the grading type identification, and selecting the labeling script associated with the grading selection operation in the labeling model algorithm module corresponding to the grading type.
7. The automated data labeling method of claim 6, wherein each of the automated labeling models in the labeling model algorithm module operates in a separate container or virtual machine; the hierarchical type identification includes: an evaluation identifier and/or a scene identifier; the selecting the annotation script associated with the hierarchical selection operation in the annotation model algorithm module corresponding to the hierarchical type comprises the following steps:
displaying a hierarchical description of each of the automatic annotation models, the hierarchical description comprising: labeling scores and/or scene descriptions; the marking score evaluation score is used for indicating the evaluation identification, and the scene description is used for indicating the scene identification;
Determining the annotation model algorithm module corresponding to the grading type in response to one or more grading selection operations based on the grading description;
one or more annotation scripts associated with the hierarchical selection operation are determined in the annotation model algorithm module.
8. The automatic data labeling method according to claim 1, wherein the labeling agent module is further configured to determine an execution progress of the labeling script, and send the execution progress to the labeling host system, and display the execution progress on a display interface of the labeling host system.
9. An automatic data annotation method as claimed in any one of claims 6 to 8, wherein the annotation category comprises at least one of: target detection labels, target tracking labels, target segmentation labels or text labels; the selecting the annotation category in response to a task selection operation for a task selection box includes:
displaying a target detection mark identifier, a target tracking mark identifier, a target segmentation mark identifier or a text mark identifier on the task selection frame;
determining the task selection operation in response to a trigger state of at least one of the target detection annotation identifier, the target tracking annotation identifier, the target segmentation annotation identifier or the text annotation identifier;
And determining the annotation category according to the task selection operation.
10. The automatic data labeling system is characterized by comprising a labeling main system and N labeling agent modules, wherein the labeling agent modules comprise associated labeling model algorithm modules; the N is the number of labeling categories, and the N is an integer greater than or equal to 1; the labeling model algorithm module comprises a labeling script of an automatic labeling model; the system is configured to perform the automatic data annotation method according to any of claims 1 to 9.
11. An electronic device comprising a memory storing a computer program and a processor implementing the automatic data annotation method according to any of claims 1 to 9 when the computer program is executed by the processor.
12. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the automatic data annotation method according to any of claims 1 to 9.
CN202310401758.5A 2023-04-11 2023-04-11 Automatic data labeling method, system, equipment and storage medium Pending CN116467598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310401758.5A CN116467598A (en) 2023-04-11 2023-04-11 Automatic data labeling method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310401758.5A CN116467598A (en) 2023-04-11 2023-04-11 Automatic data labeling method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116467598A true CN116467598A (en) 2023-07-21

Family

ID=87181808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310401758.5A Pending CN116467598A (en) 2023-04-11 2023-04-11 Automatic data labeling method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116467598A (en)

Similar Documents

Publication Publication Date Title
CN109492698B (en) Model training method, object detection method and related device
CN105264529B (en) The data for being used for the machine application are indexed
CN106254848B (en) A kind of learning method and terminal based on augmented reality
CN102520841B (en) Collection user interface
CN108345543B (en) Data processing method, device, equipment and storage medium
CN110020411B (en) Image-text content generation method and equipment
CN104520850B (en) Three dimensional object browses in document
US20090158238A1 (en) Method and apparatus for providing api service and making api mash-up, and computer readable recording medium thereof
CN107003877A (en) The context deep-link of application
CN104281656B (en) The method and apparatus of label information are added in the application
JP2020521376A (en) Agent decisions to perform actions based at least in part on image data
CN107368550B (en) Information acquisition method, device, medium, electronic device, server and system
CN105975393B (en) Page display detection method and system
CN105849758A (en) Multi-modal content consumption model
CN107632751B (en) Information display method and device
CN110998503A (en) Capture content sharing interface
CN106371706A (en) Method and device for site selection of application shortcuts
CN110851211A (en) Method, apparatus, electronic device, and medium for displaying application information
CN114647412A (en) Content display method and terminal equipment
CN112148962B (en) Method and device for pushing information
CN112732379A (en) Operation method of application program on intelligent terminal, terminal and storage medium
CN112506503A (en) Programming method, device, terminal equipment and storage medium
CN117251538A (en) Document processing method, computer terminal and computer readable storage medium
US20170277722A1 (en) Search service providing apparatus, system, method, and computer program
CN116467598A (en) Automatic data labeling method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination