CN114329128A - Method and device for acquiring marking data, computer equipment and storage medium - Google Patents

Method and device for acquiring marking data, computer equipment and storage medium Download PDF

Info

Publication number
CN114329128A
CN114329128A CN202111666364.XA CN202111666364A CN114329128A CN 114329128 A CN114329128 A CN 114329128A CN 202111666364 A CN202111666364 A CN 202111666364A CN 114329128 A CN114329128 A CN 114329128A
Authority
CN
China
Prior art keywords
data
request
annotation
labeling
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111666364.XA
Other languages
Chinese (zh)
Inventor
刘圣
杨逍红
王姣
吴佐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aibee Technology Co Ltd
Original Assignee
Beijing Aibee Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aibee Technology Co Ltd filed Critical Beijing Aibee Technology Co Ltd
Priority to CN202111666364.XA priority Critical patent/CN114329128A/en
Publication of CN114329128A publication Critical patent/CN114329128A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The application relates to a method and a device for acquiring annotation data, a computer device, a storage medium and a computer program product. The method is applied to a labeling request terminal and comprises the following steps: acquiring attribute information of a target model class; constructing an example of the target model class according to the attribute information; based on a labeling request method in an example, a data labeling request corresponding to data to be labeled is sent to a data labeling end, so that the data labeling end labels the data to be labeled as labeled data according to the data labeling request; based on the result downloading method in the example, a result query request indicating whether the marked data exist is sent to the data marking end, if a marked data existence response returned by the data marking end in response to the result query request is received, a result downloading request is sent to the data marking end, and marked data returned by the data marking end in response to the result downloading request is received. By adopting the method, the acquisition efficiency of the annotation request end for the annotation data can be improved.

Description

Method and device for acquiring marking data, computer equipment and storage medium
Technical Field
The present application relates to the field of data interaction technologies, and in particular, to a method and an apparatus for acquiring tagged data, a computer device, a storage medium, and a computer program product.
Background
In the field of artificial intelligence, various models are used to implement functions such as natural language processing, image recognition, and intelligent recommendation. Since the model needs to be optimized and adjusted using the annotation data set (e.g., training set, test set) constructed from the annotation data, the annotation data needs to be prepared in advance in order to improve the model performance. In the process of data annotation, interaction between an annotation request end and a data annotation end is involved. The data annotation terminal is used for annotating the data.
In the annotation request terminal, the conventional method for acquiring the annotation data needs to manually send the data to be annotated to the data annotation terminal, and needs to manually check whether the data annotation terminal has completed the data annotation work, and at the same time, needs to manually download the annotation data completed by the data annotation terminal.
Therefore, in the conventional technology, much labor cost is consumed, and the efficiency of the annotation request end for acquiring the annotation data is low.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for acquiring annotation data, which can improve the efficiency of acquiring annotation data by an annotation requesting end.
In a first aspect, the present application provides a method for acquiring annotation data. The method is applied to a labeling request terminal and comprises the following steps:
acquiring attribute information of a target model class;
constructing an instance of the target model class according to the attribute information;
based on the marking request method in the example, sending a data marking request corresponding to the data to be marked to a data marking end, so that the data marking end marks the data to be marked as marked data according to the data marking request;
based on the result downloading method in the example, a result query request indicating whether the marked data exist is sent to the data marking end, if a marked data existence response returned by the data marking end in response to the result query request is received, a result downloading request is sent to the data marking end, and the marked data returned by the data marking end in response to the result downloading request is received.
In one embodiment, the method further comprises:
based on the operation selection method in the example, sending operation option information aiming at the annotation data to an annotation request account of the annotation request terminal, and acquiring an operation instruction generated by the annotation request account in response to the operation option information;
and if the operation instruction is a labeling continuation instruction, continuously sending a data labeling request corresponding to the labeling continuation instruction to the data labeling end based on a labeling request method corresponding to the labeling continuation instruction in the example.
In one embodiment, the method further comprises:
and if the operation instruction is an annotation ending instruction, verifying whether the annotated data is qualified or not based on a result verification method in the example, and if the annotated data is verified to be qualified, constructing an annotated data set according to the annotated data.
In one embodiment, the method further comprises:
sending a marking request message corresponding to the data marking request to a data marking account of the data marking end;
and/or the presence of a gas in the gas,
and sending the operation selection message corresponding to the operation option information to the annotation request account of the annotation request terminal.
In one embodiment, the method further comprises:
splitting initial data to be marked according to the marking type based on the data splitting method in the example to obtain the data to be marked;
the method for marking a request based on the example sends a data marking request corresponding to data to be marked to a data marking end so that the data marking end marks the data to be marked as marked data according to the data marking request comprises the following steps:
based on the labeling request method in the example, a labeling task is generated according to data to be labeled and labeling demand information, and a data labeling request carrying the labeling task is sent to a data labeling end, so that the data labeling end labels the data to be labeled as labeling data according to the labeling task in the data labeling request.
In one embodiment, the obtaining attribute information of the target model class includes:
acquiring a specific attribute value of a target model class input by a labeling request account through a labeling request terminal;
reading basic attributes, basic attribute values and specific attributes of the pre-stored basic model class;
assigning the specific attribute value to a specific attribute of the basic model class so as to convert the basic model class into a target model class;
and determining the basic attribute, the basic attribute value, the specific attribute and the specific attribute value in the target model class as the attribute information of the target model class.
In a second aspect, the present application further provides an apparatus for acquiring annotation data. The device is used for marking a request end and comprises:
the information acquisition module is used for acquiring the attribute information of the target model class;
the example building module is used for building an example of the target model class according to the attribute information;
the annotation request module is used for sending a data annotation request corresponding to the data to be annotated to a data annotation end based on the annotation request method in the example, so that the data annotation end annotates the data to be annotated as annotated data according to the data annotation request;
and the result downloading module is used for sending a result query request indicating whether the marked data exist to the data marking end based on the result downloading method in the example, sending a result downloading request to the data marking end if a marked data existence response returned by the data marking end in response to the result query request is received, and receiving the marked data returned by the data marking end in response to the result downloading request.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring attribute information of a target model class;
constructing an instance of the target model class according to the attribute information;
based on the marking request method in the example, sending a data marking request corresponding to the data to be marked to a data marking end, so that the data marking end marks the data to be marked as marked data according to the data marking request;
based on the result downloading method in the example, a result query request indicating whether the marked data exist is sent to the data marking end, if a marked data existence response returned by the data marking end in response to the result query request is received, a result downloading request is sent to the data marking end, and the marked data returned by the data marking end in response to the result downloading request is received.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring attribute information of a target model class;
constructing an instance of the target model class according to the attribute information;
based on the marking request method in the example, sending a data marking request corresponding to the data to be marked to a data marking end, so that the data marking end marks the data to be marked as marked data according to the data marking request;
based on the result downloading method in the example, a result query request indicating whether the marked data exist is sent to the data marking end, if a marked data existence response returned by the data marking end in response to the result query request is received, a result downloading request is sent to the data marking end, and the marked data returned by the data marking end in response to the result downloading request is received.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring attribute information of a target model class;
constructing an instance of the target model class according to the attribute information;
based on the marking request method in the example, sending a data marking request corresponding to the data to be marked to a data marking end, so that the data marking end marks the data to be marked as marked data according to the data marking request;
based on the result downloading method in the example, a result query request indicating whether the marked data exist is sent to the data marking end, if a marked data existence response returned by the data marking end in response to the result query request is received, a result downloading request is sent to the data marking end, and the marked data returned by the data marking end in response to the result downloading request is received.
According to the method, the device, the computer equipment, the storage medium and the computer program product for acquiring the labeled data, the example is constructed through the attribute information of the target model class, the data labeling request is automatically generated and sent to the data labeling end based on the labeling request method in the example, the result query request is automatically sent to the data labeling end based on the result downloading method in the example, and the result downloading request is automatically triggered when the response of the labeled data is obtained, so that the labeled data of the data labeling end is acquired. The method for acquiring the annotation data of the annotation request end can be automatically executed in a pipeline mode by constructing the example of the target model class and calling the method in the example, so that the user operation is reduced, and the acquisition efficiency of the annotation request end for the annotation data is improved.
Drawings
FIG. 1 is a diagram of an application environment of a method for retrieving annotation data in one embodiment;
FIG. 2 is a flowchart illustrating a method for obtaining annotation data according to an embodiment;
FIG. 3 is a block diagram showing an exemplary embodiment of an apparatus for acquiring annotation data;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for acquiring the annotation data provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The annotation request 102 communicates with the data annotation 104 via a network. Specifically, the annotation request terminal 102 obtains attribute information of the target model class, and constructs an instance of the target model class according to the attribute information. The annotation request terminal 102 sends a data annotation request corresponding to data to be annotated to the data annotation terminal 104 based on the annotation request method in the example. And the data annotation end 104 annotates the data to be annotated as the annotated data according to the data annotation request. The annotation request terminal 102 sends a result query request indicating whether annotation data exists to the data annotation terminal 104 based on the result downloading method in the example. If the data annotation terminal 104 returns an annotation data presence response to the annotation request terminal 102 in response to the result query request, the annotation request terminal 102 sends a result download request to the data annotation terminal 104. The data annotation terminal 104 returns annotation data to the annotation requesting terminal 102 in response to the result downloading request.
The annotation request end 102 and the data annotation end 104 may be, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The annotation request end 102 and the data annotation end 104 can also be implemented by an independent server or a server cluster composed of a plurality of servers.
In an embodiment, as shown in fig. 2, a method for obtaining annotation data is provided, which is described by taking the method as an example applied to the annotation requesting end 102 in fig. 1, and includes the following steps:
step S202, acquiring the attribute information of the target model class.
The model class refers to a class for a model. In this embodiment, classes are used to describe a collection of objects in a model that have the same properties and methods. Classes define the properties and methods that are common to each object in the collection. In object-oriented programming software (e.g., python), model classes can be created by class statements.
Specifically, the annotation request obtains attribute information of the target model class.
And step S204, constructing an example of the target model class according to the attribute information.
Specifically, the annotation request terminal instantiates the target model class according to the attribute information to obtain an instance of the target model class. In the example, abstract methods such as a labeling request method, a result downloading method, an operation selection method, and the like for realizing the acquisition of the labeling data are configured.
Step S206, based on the labeling request method in the example, sending the data labeling request corresponding to the data to be labeled to the data labeling end, so that the data labeling end labels the data to be labeled as the labeling data according to the data labeling request.
The data to be annotated can be an image to be annotated, a text to be annotated, and the like.
Specifically, the annotation request terminal calls an annotation request method in the instance, and sends a data annotation request corresponding to the data to be annotated to the data annotation terminal based on the annotation request method. Optionally, the annotation request end generates a data annotation request according to the data to be annotated, so that the data annotation request carries the data to be annotated. Optionally, the annotation request end generates a data annotation request according to a storage directory corresponding to the data to be annotated, so that the data annotation request carries the storage directory, and the data annotation end can load the data to be annotated based on the storage directory. And the data annotation end receives and analyzes the data annotation request, so that the data to be annotated is annotated as annotated data.
Step S208, based on the result downloading method in the example, a result query request indicating whether the tagged data exists is sent to the data tagging end, if a tagged data existence response returned by the data tagging end in response to the result query request is received, a result downloading request is sent to the data tagging end, and tagged data returned by the data tagging end in response to the result downloading request is received.
The marked data can be regarded as a result obtained after the data to be marked is marked. The result query request is used for querying whether the annotation data exists. The annotation data presence response indicates that annotation data is present.
Specifically, after the annotation request terminal sends a data annotation request, the annotation request terminal invokes a result downloading method in the instance, and based on the result downloading method, sends a result query request indicating whether the annotation data exists to the data annotation terminal at a preset time interval (for example, 10 minutes). And if the data annotation end returns an annotated data non-existence response in response to the result query request, indicating that annotation is not completed, and continuing to send a result query request indicating whether annotated data exists to the data annotation end after waiting for a preset time interval. And if the data annotation end returns an annotation data existence response in response to the result query request, which indicates that the annotation is completed, the annotation request end sends a result downloading request to the data annotation end. And the data annotation end returns the annotated data after annotation to the annotation request end according to the result downloading request. And the marking request end downloads the marking data to a preset path.
In the method for acquiring the labeled data, the example is constructed through the attribute information of the target model class, the data labeling request is automatically generated and sent to the data labeling end based on the labeling request method in the example, the result query request is automatically sent to the data labeling end based on the result downloading method in the example, and the result downloading request is automatically triggered when the labeled data response is obtained, so that the labeled data of the data labeling end is acquired. It can be understood that the method can automatically execute the method for acquiring the annotation data of the annotation request end in a pipeline manner by constructing the instance of the target model class and calling the method in the instance, so that the user operation is reduced, and the method is favorable for improving the efficiency of acquiring the annotation data by the annotation request end.
In one embodiment, the operation selection process is related to whether the annotation data returned by the data annotation end needs to be annotated continuously. On the basis of the above embodiment, the method further comprises the steps of:
step S212, based on the operation selection method in the example, sending operation option information aiming at the annotation data to the annotation request account of the annotation request terminal, and acquiring an operation instruction generated by the annotation request account in response to the operation option information;
step S214, if the operation instruction is the annotation continuing instruction, the data annotation request corresponding to the annotation continuing instruction is continuously sent to the data annotation terminal based on the annotation request method corresponding to the annotation continuing instruction in the example.
Specifically, the annotation request end calls an operation selection method in the example after obtaining the annotation data returned by the data annotation end, and sends operation option information for the annotation data to an annotation request account of the annotation request end based on the operation selection method. After the target operation option is selected by the annotation request account, the annotation request end obtains the operation instruction corresponding to the target operation option. And if the operation instruction is a marking continuation instruction, the marking request end continuously sends a data marking request corresponding to the marking continuation instruction to the data marking end based on a marking request method corresponding to the marking continuation instruction in the example, so that the data marking end continuously finishes data marking work according to the data marking request.
In a specific example, after the annotation request end obtains annotation data, operation option information for the annotation data is sent to an annotation request account of the annotation request end, and the operation option information includes annotation continuing operation (for example, re-annotation operation (annotation round plus 1), reprocessing operation (processing round plus 1, annotation round plus 1), last round of operation (data annotation in the last round is performed)) and annotation ending operation (annotation stopping operation (subsequent data processing method in the call example)). the processing round can be understood as the data processing of the second round, which aims to preliminarily process the data to be annotated into a data set capable of performing model processing (for example, verification) And (5) making.
In the embodiment, the operation option information is sent to the annotation request account, and the annotation continues to operate on the basis of the annotation selected by the annotation request account, so that the data annotation can be circularly processed, and the data annotation process can be flexibly controlled.
In an embodiment, in combination with the previous embodiment, the method further includes the steps of:
step S222, if the operation instruction is an annotation ending instruction, checking whether the annotated data is qualified or not based on a result checking method in the example, and if the annotated data is checked to be qualified, constructing an annotated data set according to the annotated data.
Specifically, if the operation instruction is an end-of-labeling instruction, that is, the end-of-labeling operation is selected by the request-of-labeling account, the request-of-labeling end calls a result verification method in the instance, and based on the result verification method, whether the labeled data is qualified is verified. And if the marked data is verified to be qualified, constructing a marked data set according to the marked data. If the verification marking data is not qualified, the step S206 is executed again.
In this embodiment, the annotation continues to be operated or ends to be operated based on the annotation selected by the annotation request account, and the annotation can be repeated or at the middle end, so as to realize flexible control of the data annotation process.
In one embodiment, the process involves sending interactive messages to the data annotation account and the annotation request account to notify the corresponding user of timely processing. On the basis of the above embodiment, the method further comprises the steps of:
step S232, the marking request message corresponding to the data marking request is sent to the data marking account of the data marking end;
and/or the presence of a gas in the gas,
step S234, sending the operation selection message corresponding to the operation option information to the annotation request account of the annotation requesting end.
Specifically, the annotation request terminal may send the corresponding annotation request message to the data annotation account of the data annotation terminal when the data annotation request is generated. The annotation request end can also send the corresponding annotation request message to the data annotation account of the data annotation end when the data annotation request is sent out. The annotation request end can also send the corresponding annotation request message to the data annotation account of the data annotation end after the data annotation request is sent out. Therefore, the data annotation account is informed to complete the data annotation work in time. For example, the annotation request message is sent to the flybook group of the data annotation end, and @ corresponding data annotation account, so that the purpose of notifying annotation personnel is achieved. It is understood that the sending timing of the annotation request message is not limited in this embodiment.
And when the annotation request end sends the operation option information aiming at the annotation data to the annotation request account of the annotation request end, the operation selection message corresponding to the operation option information is sent to the annotation request account. And after the annotation request end sends the operation option information aiming at the annotation data to the annotation request account of the annotation request end, the annotation request end sends the operation selection message corresponding to the operation option information to the annotation request account. For example, the operation selection message is sent to the flybook group of the annotation request end, and @ the corresponding annotation request account, so that the annotation request personnel is notified to complete the operation selection work in time. It is to be understood that the present embodiment does not limit the transmission timing of the operation selection message.
In this embodiment, an interaction is formed with the relevant person in time in a message notification manner, for example, a communication connection is established between an Application Programming Interface (API) provided by the flybook robot and the airflow, so that the airflow can timely transmit the notification message to the user, and the user is prompted to process the relevant task as soon as possible, which is beneficial to improving the processing efficiency. And the flybook is a new generation of enterprise office software, not only can satisfy the information exchange among the colleagues, but also compared with other competitive products, the provided flybook robot can conveniently realize the information exchange between a software system and a person.
In one embodiment, the method further comprises the steps of:
step S242, based on the data splitting method in the example, the initial data to be labeled is split according to the labeling type, so as to obtain the data to be labeled.
The method includes the steps of marking an image to be marked, wherein the image to be marked is taken as an example, the marking type can be a marking detection frame, such as a human body frame, a human face frame, a vehicle body frame and the like, or can be a marking image label, such as whether the image is fuzzy, whether a human face exists, whether a vehicle exists and the like, or can be image comparison, such as a main graph with the human face, a plurality of auxiliary graphs with the human face, and which auxiliary graphs are marked and the main graph is a person.
Specifically, the annotation request terminal copies the initial data to be annotated to a preset server path. And then, the marking request end calls a data splitting method in the example, and based on the data splitting method, the initial data to be marked is split according to the marking type, so that the split data to be marked is obtained. It can be understood that the number of the data to be labeled in the embodiment is multiple. For example, the annotation request end splits the initial image to be annotated according to the type of the detection frame in the initial image to be annotated, integrates images including human body frames, integrates images including human face frames, and integrates images including vehicle body frames to obtain three types of data to be annotated.
Optionally, after the data is split, the data to be labeled is arranged into a data format required by labeling.
In this embodiment, because a large amount of data easily affects the stability of the data annotation end, before sending the data to be annotated, the data to be annotated with a small amount of data of different annotation types is constructed by splitting the initial data to be annotated with a large amount of data according to the annotation type, which is beneficial to the importing and annotation of subsequent annotation personnel.
Further, in an embodiment, the step S206 may be specifically implemented by the following steps:
step S2062, based on the annotation request method in the example, a annotation task is generated according to the data to be annotated and the annotation demand information, and the data annotation request carrying the annotation task is sent to the data annotation end, so that the data annotation end annotates the data to be annotated as the annotation data according to the annotation task in the data annotation request.
The labeling requirement information can include a labeling type, labeling personnel information and the like.
Specifically, a labeling request end calls a labeling request method in an example, based on a preset labeling rule in the labeling request method, a label sending tool is adopted to generate a labeling task according to data to be labeled and labeling demand information, a data labeling request is generated based on the labeling task, and the data labeling request carrying the labeling task is sent to a data labeling end. In addition, after the annotation task is successfully created, an annotation request message corresponding to the data annotation request carrying the annotation task is sent to the data annotation account of the data annotation terminal. For example, the annotation request message is sent to the flybook group, and @ the corresponding data annotation account (the data annotation account and the annotation type establish an association relationship in advance) is @ established, so that the purpose of notifying annotation personnel is achieved. And the data annotation end receives the data annotation request and analyzes the data annotation request to obtain an annotation task, so that the data to be annotated is annotated as annotated data according to the annotation task.
In an embodiment, step S202 may be specifically implemented by the following steps:
step S2022, acquiring a specific attribute value of the target model class input by the annotation request account through the annotation request terminal;
step S2024, reading basic attributes, basic attribute values and specific attributes of the pre-stored basic model class;
step S2026, assigning the specific attribute value to the specific attribute of the basic model class so as to convert the basic model class into a target model class;
step S2028, determine the basic attribute, the basic attribute value, the specific attribute, and the specific attribute value in the target model class as the attribute information of the target model class.
The basic attribute of the basic model class and the basic attribute value corresponding to the basic attribute can be inherited to the target model class. The basic model class also includes abstract labeling request method, result downloading method, etc. The specific attribute and the specific attribute value are attribute information specific to the target model class.
Specifically, the annotation request end stores a basic model class in advance. The basic model class is configured with basic attributes, basic attribute values corresponding to the basic attributes and specific attributes. The annotation request account can input the specific attribute value of the target model class through a display interface of the annotation request terminal, and then assigns the specific attribute value to the specific attribute of the basic model class, so that the basic model class is converted into the target model class. And the marking request end determines the basic attribute, the basic attribute value, the specific attribute and the specific attribute value in the target model class as the attribute information of the target model class.
Alternatively, the annotation data acquisition method may be run in airflow. airflow is a development platform that programmatically implements the creation, scheduling, and monitoring of tasks. And triggering an external parameter transmission mechanism of a Directed Acyclic Graph (DAG) through the airflow to acquire a specific attribute value of a target model class filled in by an annotation request account through a front-end page of the airflow of an annotation request end. Optionally, the specific attributes include a target model type, a processing round, a labeling round, and the like. Different data splitting methods and labeling request methods in the example can be determined according to the processing round and the labeling round.
In this embodiment, by presetting the basic model class, when the target model class is constructed, the attribute information of the basic model class can be inherited quickly, and a specific attribute value is assigned to the basic model class to obtain the target model class, so that the attribute information of the target model class can be determined quickly.
An embodiment of the present application is described below with reference to a specific application scenario.
Exemplified by a pedestrian re-identification (reid) model.
The data splitting method comprises the steps of downloading an initial image to be annotated reflowed on line, wherein the initial image to be annotated may have a video and an image in a pb (protocol buffers) format, and analyzing and classifying the initial image to be annotated according to a specific file, so that the initial image to be annotated can be split and classified according to unique pid (PersonID). Since the classification result is the output of the service line algorithm engineering logic, there will be errors, and therefore the subsequent cleaning process needs to be continued.
Cleaning in a first round: in order to ensure that only one person of the images to be annotated is positioned under one pid folder, the images to be annotated under each pid need to be cleaned. The cleaning mode is to find out the image to be annotated with the best quality under the pid folder as the main picture, namely as the representative of the pid, and then check whether other images to be annotated under the pid and the main picture are the same person. And arranging the processed images to be marked into data in a format meeting the marking requirement, carrying out manual marking, and finally cleaning the marked images different from the main image under each pid according to the marked images manually marked.
And a second round of combination: since classification according to algorithmic engineering logic may involve assigning the same person to different pids, all pids need to be merged according to preset merging conditions. Specifically, the method comprises the steps of extracting features of all images to be labeled by using a reid algorithm model, calculating an average feature of each pid according to pid classification, and finally calculating a feature distance between each pid and other pids (the feature distance can represent the non-similarity of two feature matrixes, and the greater the distance, the more different two people are from each other). According to the characteristic distance, the information of pids similar to each pid can be obtained, and the main graphs of the pids are clustered together to form the final image to be annotated. And merging the pids which can be merged together with the annotation images according to the manually annotated annotation images.
And (3) third cleaning: similar to the first round of cleaning, the method is mainly used for eliminating errors caused by the second round of merging, and the cleanness degree of data can be confirmed secondarily.
Fourth round data verification: after the data processing is completed in the first three rounds, the marked image can be regarded as data meeting the requirements, the marked images are subjected to reasoning and evaluation through the reid model, the index condition of the reid model on the data set and the conditions of some error cases are checked, and if the marked images meet the expected requirements, the marked images are sorted and uploaded to a data set platform; if there is still a deviation, the process is re-cycled from the first round.
In order to solve the problems that the data set construction period is long and manpower and labor hours are consumed greatly, the embodiment of the application designs a flow data set construction pipeline system. The scheduling method based on the open platform airflow summarizes and optimizes the process established by each different algorithm model data set, designs different workflows according to the requirements of the processing process, thereby performing the scheduling of the processing process, and automates the processing processes of the manual labeling intervention related label sending and labeling results, so as to form a set of complete automatic data set construction pipeline processing system, so that the data set establishment work of algorithm personnel and testing personnel can be controlled, but the method is simpler and more efficient.
In the embodiment of the application, the construction of the labeling data set is fully automatic, and the working efficiency is greatly improved. The method breaks through the restriction of air flow without circulation, so that the flow can be flexibly controlled and can be repeatedly interrupted. Each data set can be constructed by using a system to start a pipeline (pipeline), each pipeline is completely independent, and just like starting processes, a plurality of data sets can run the pipeline simultaneously, so that concurrence is realized. Through designing many rounds of processing, many rounds of marks, if annotate personnel mark behind 1 round inaccurate, just can continue to send mark second round, the third round, until the mark result accords with the accuracy requirement, just can stop. Therefore, a plurality of tasks constructed by data of different algorithm types can be executed simultaneously, and the fault tolerance of system data annotation is strong. The airflow is used as a basic tool of the assembly line, and the airflow supports distributed deployment, so that the working nodes can be deployed in a distributed manner, and load balancing is realized to the greatest extent. The system has strong compatibility, the interface design is open, and the data set construction requirements of different algorithm models can be met. And related data are managed in a unified manner, so that the accuracy and the safety of data processing can be ensured.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides an apparatus for acquiring annotation data, which is used for implementing the above-mentioned method for acquiring annotation data. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in the following embodiments of one or more apparatuses for acquiring annotation data may refer to the above limitations on the method for acquiring annotation data, and are not described herein again.
In one embodiment, as shown in fig. 3, there is provided an apparatus for acquiring annotation data, including: an information obtaining module 302, an instance building module 304, an annotation requesting module 306, and a result downloading module 308, wherein:
an information obtaining module 302, configured to obtain attribute information of a target model class;
an instance building module 304, configured to build an instance of the target model class according to the attribute information;
the annotation request module 306 is configured to send a data annotation request corresponding to data to be annotated to a data annotation end based on an annotation request method in an example, so that the data annotation end annotates the data to be annotated as annotated data according to the data annotation request;
the result downloading module 308 is configured to send a result query request indicating whether the tagged data exists to the data tagging end based on the result downloading method in the example, send a result downloading request to the data tagging end if a tagged data existence response returned by the data tagging end in response to the result query request is received, and receive tagged data returned by the data tagging end in response to the result downloading request.
In the device for acquiring the marked data, an example is constructed through the attribute information of the target model class, a data marking request is automatically generated and sent to a data marking end based on a marking request method in the example, a result query request is automatically sent to the data marking end based on a result downloading method in the example, and the result downloading request is automatically triggered when a response exists in the marked data, so that the marked data of the data marking end is acquired. The method and the device have the advantages that the method for acquiring the annotation data of the annotation request end can be automatically executed in a pipeline mode by constructing the example of the target model class and calling the method in the example, so that the user operation is reduced, and the efficiency of acquiring the annotation data by the annotation request end is improved.
In one embodiment, the apparatus further comprises:
the operation selection module is used for sending operation option information aiming at the annotation data to an annotation request account of an annotation request terminal based on an operation selection method in an example, and acquiring an operation instruction generated by the annotation request account in response to the operation option information;
and the marking continuing module is used for continuously sending the data marking request corresponding to the marking continuing instruction to the data marking end based on the marking request method corresponding to the marking continuing instruction in the example if the operation instruction is the marking continuing instruction.
In one embodiment, the apparatus further comprises:
the marking end module is used for verifying whether the marked data are qualified or not based on a result verification method in the example if the operation instruction is the marking end instruction; and if the marked data is verified to be qualified, constructing a marked data set according to the marked data.
In one embodiment, the apparatus further comprises:
the data splitting module is used for splitting initial data to be marked according to the marking type based on a data splitting method in an example to obtain the data to be marked;
the annotation request module 306 is specifically configured to generate an annotation task according to data to be annotated and annotation demand information based on an annotation request method in an example, and send a data annotation request carrying the annotation task to a data annotation end, so that the data annotation end annotates the data to be annotated as annotation data according to the annotation task in the data annotation request.
In an embodiment, the information obtaining module 302 is specifically configured to obtain a specific attribute value of a target model class input by an annotation request account through an annotation request end; reading basic attributes, basic attribute values and specific attributes of the pre-stored basic model class; assigning the specific attribute value to the specific attribute of the basic model class so as to convert the basic model class into a target model class; and determining the basic attribute, the basic attribute value, the specific attribute and the specific attribute value in the target model class as the attribute information of the target model class.
The modules in the above mentioned device for acquiring labeled data can be realized by software, hardware and their combination in whole or in part. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of retrieving annotation data. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A method for acquiring annotation data is applied to an annotation request end, and comprises the following steps:
acquiring attribute information of a target model class;
constructing an instance of the target model class according to the attribute information;
based on the marking request method in the example, sending a data marking request corresponding to the data to be marked to a data marking end, so that the data marking end marks the data to be marked as marked data according to the data marking request;
based on the result downloading method in the example, a result query request indicating whether the marked data exist is sent to the data marking end, if a marked data existence response returned by the data marking end in response to the result query request is received, a result downloading request is sent to the data marking end, and the marked data returned by the data marking end in response to the result downloading request is received.
2. The method of claim 1, further comprising:
based on the operation selection method in the example, sending operation option information aiming at the annotation data to an annotation request account of the annotation request terminal, and acquiring an operation instruction generated by the annotation request account in response to the operation option information;
and if the operation instruction is a labeling continuation instruction, continuously sending a data labeling request corresponding to the labeling continuation instruction to the data labeling end based on a labeling request method corresponding to the labeling continuation instruction in the example.
3. The method of claim 2, further comprising:
and if the operation instruction is an annotation ending instruction, verifying whether the annotated data is qualified or not based on a result verification method in the example, and if the annotated data is verified to be qualified, constructing an annotated data set according to the annotated data.
4. A method according to claim 2 or 3, characterized in that the method further comprises:
sending a marking request message corresponding to the data marking request to a data marking account of the data marking end;
and/or the presence of a gas in the gas,
and sending the operation selection message corresponding to the operation option information to the annotation request account of the annotation request terminal.
5. The method of claim 1, further comprising:
splitting initial data to be marked according to the marking type based on the data splitting method in the example to obtain the data to be marked;
the method for marking a request based on the example sends a data marking request corresponding to data to be marked to a data marking end so that the data marking end marks the data to be marked as marked data according to the data marking request comprises the following steps:
based on the labeling request method in the example, a labeling task is generated according to data to be labeled and labeling demand information, and a data labeling request carrying the labeling task is sent to a data labeling end, so that the data labeling end labels the data to be labeled as labeling data according to the labeling task in the data labeling request.
6. The method of claim 1, wherein the obtaining attribute information of the target model class comprises:
acquiring a specific attribute value of a target model class input by a labeling request account through a labeling request terminal;
reading basic attributes, basic attribute values and specific attributes of the pre-stored basic model class;
assigning the specific attribute value to a specific attribute of the basic model class so as to convert the basic model class into a target model class;
and determining the basic attribute, the basic attribute value, the specific attribute and the specific attribute value in the target model class as the attribute information of the target model class.
7. An apparatus for obtaining annotation data, wherein the apparatus is used for an annotation request side, and comprises:
the information acquisition module is used for acquiring the attribute information of the target model class;
the example building module is used for building an example of the target model class according to the attribute information;
the annotation request module is used for sending a data annotation request corresponding to the data to be annotated to a data annotation end based on the annotation request method in the example, so that the data annotation end annotates the data to be annotated as annotated data according to the data annotation request;
and the result downloading module is used for sending a result query request indicating whether the marked data exist to the data marking end based on the result downloading method in the example, sending a result downloading request to the data marking end if a marked data existence response returned by the data marking end in response to the result query request is received, and receiving the marked data returned by the data marking end in response to the result downloading request.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.
CN202111666364.XA 2021-12-30 2021-12-30 Method and device for acquiring marking data, computer equipment and storage medium Pending CN114329128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111666364.XA CN114329128A (en) 2021-12-30 2021-12-30 Method and device for acquiring marking data, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111666364.XA CN114329128A (en) 2021-12-30 2021-12-30 Method and device for acquiring marking data, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114329128A true CN114329128A (en) 2022-04-12

Family

ID=81020372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111666364.XA Pending CN114329128A (en) 2021-12-30 2021-12-30 Method and device for acquiring marking data, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114329128A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114820618A (en) * 2022-06-29 2022-07-29 心鉴智控(深圳)科技有限公司 Defect detection model training method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114820618A (en) * 2022-06-29 2022-07-29 心鉴智控(深圳)科技有限公司 Defect detection model training method, device, equipment and storage medium
CN114820618B (en) * 2022-06-29 2022-09-13 心鉴智控(深圳)科技有限公司 Defect detection model training method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US8607190B2 (en) Automation of software application engineering using machine learning and reasoning
US11416754B1 (en) Automated cloud data and technology solution delivery using machine learning and artificial intelligence modeling
US11481412B2 (en) Data integration and curation
CN111242317B (en) Method, device, computer equipment and storage medium for managing application
CN111368789B (en) Image recognition method, device, computer equipment and storage medium
CN114722281B (en) Training course configuration method and device based on user portrait and user course selection behavior
CN102880683A (en) Automatic network generation system for feasibility study report and generation method thereof
CN114329128A (en) Method and device for acquiring marking data, computer equipment and storage medium
CN113918437A (en) User behavior data analysis method and device, computer equipment and storage medium
CN117473130A (en) Service processing method, device, equipment, medium and program product
CN108427599A (en) Method, apparatus and storage medium is uniformly processed in asynchronous task
CN115062084B (en) Method and device for constructing API (application programming interface) based on database metadata
US20230117893A1 (en) Machine learning techniques for environmental discovery, environmental validation, and automated knowledge repository generation
CN115438812A (en) Life-saving management method and device for power transmission equipment, computer equipment and storage medium
CN112579149B (en) Method, device, equipment and storage medium for generating model training program mirror image
CN114781557B (en) Image information acquisition method and device and computer-readable storage medium
CN109344166B (en) Database monitoring method, computer readable storage medium and terminal device
US20240144198A1 (en) Machine Learning-based Knowledge Management for Incident Response
WO2022089613A1 (en) Text classification method and apparatus using machine learning, and electronic device
CN118155383A (en) Early warning processing method, early warning processing device, computer equipment and storage medium
CN114840237A (en) Updating method and device of flow program code, computer equipment and storage medium
CN117573387A (en) Message pushing method, device, computer equipment and storage medium
CN117251621A (en) Service matching method, device, computer equipment, storage medium and product
CN116680467A (en) Object recommendation method, device, computer equipment and storage medium
CN114091769A (en) Federal learning modeling optimization method based on feature engineering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination