CN112465032A - Distribution method and device of training data labeling tasks and computing equipment - Google Patents

Distribution method and device of training data labeling tasks and computing equipment Download PDF

Info

Publication number
CN112465032A
CN112465032A CN202011364814.5A CN202011364814A CN112465032A CN 112465032 A CN112465032 A CN 112465032A CN 202011364814 A CN202011364814 A CN 202011364814A CN 112465032 A CN112465032 A CN 112465032A
Authority
CN
China
Prior art keywords
task
training data
type
terminal
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011364814.5A
Other languages
Chinese (zh)
Inventor
刘静修
季俊
张言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinyi Intelligent Technology Co.,Ltd.
Original Assignee
Beijing Xinyi Intelligent Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinyi Intelligent Information Technology Co ltd filed Critical Beijing Xinyi Intelligent Information Technology Co ltd
Priority to CN202011364814.5A priority Critical patent/CN112465032A/en
Publication of CN112465032A publication Critical patent/CN112465032A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

A distribution method and a device for training data labeling tasks, a storage medium and a computing device are provided, the method comprises the following steps: receiving task creating information, wherein the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task; marking a task for each type of training data, and identifying a task execution terminal matched with the type; and distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task. By the scheme of the invention, a large number of training data labeling tasks can be automatically distributed to each task execution terminal with high efficiency, and the distribution efficiency is improved.

Description

Distribution method and device of training data labeling tasks and computing equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for distributing training data labeling tasks, a storage medium and computing equipment.
Background
The training data labeling refers to a process of describing or labeling training data such as text, pictures, voice and the like (for example, labeling positions such as a left eye outer canthus and a right eye outer canthus on a face sample image) so as to use the labeled training data for machine learning. With the technical development in the fields of artificial intelligence and the like, a data annotation demander (for example, a science and technology company in the fields of artificial intelligence and the like) has an increasing demand for training data annotation, the data annotation demander can send a large amount of training data to be annotated to a data annotation platform, the data annotation platform needs to create a large amount of training data annotation tasks for the training data sent by the data annotation demander, and the training data are annotated by a task execution terminal of a plurality of data annotation executors (for example, an execution main body specially performing data annotation).
Therefore, a method for allocating training data labeling tasks is needed to efficiently allocate a large number of training data labeling tasks, so as to further improve the efficiency of completing the training data labeling tasks.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for distributing training data marking tasks, which can efficiently distribute a large number of training data marking tasks, thereby further improving the efficiency of completing the training data marking tasks.
In order to solve the above technical problem, an embodiment of the present invention provides a method for allocating training data tagging tasks, where the method includes: receiving task creating information, wherein the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task; marking a task for each type of training data, and identifying a task execution terminal matched with the type; and distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task.
Optionally, the step of allocating each type of training data labeling task to the task execution terminal matched with the training data labeling task includes: for each type of training data labeling task, dividing the type of training data labeling task according to the number of task execution terminals matched with the type to obtain the training data labeling tasks required to be completed by each task execution terminal; and for each task execution terminal, adding the training data marking task which needs to be completed to the task list of the task execution terminal.
Optionally, the step of allocating each type of training data labeling task to the task execution terminal matched with the training data labeling task further includes: monitoring the state information of each task execution terminal, and selecting the next training data labeling task from the task list when the state information indicates that the task execution terminal is in an idle state; and sending the training data corresponding to the next training data labeling task to the task execution terminal.
Optionally, the task list includes first progress information, where the first progress information is a ratio of the number of completed training data tagging tasks to the number of training data tagging tasks to be completed, and the method further includes: monitoring first progress information of each task execution terminal; when first progress information of any task execution terminal reaches a first preset threshold value, judging whether the task to be completed needs to be redistributed, wherein the task to be completed is a training data marking task which is not completed by a first type terminal, the first type terminal is all task execution terminals which are matched with the first type terminal and have the same type, and the first terminal is a task execution terminal of which the first progress information reaches the first preset threshold value; and if so, reallocating the task to be completed to the first type terminal.
Optionally, the determining whether the task to be completed needs to be redistributed includes: counting the number of the tasks to be completed, and calculating the average value of the number of the tasks to be completed according to the number of the first type terminals; and comparing the number of the training data labeling tasks which are not completed by the first terminal with the average value, and if the number of the training data labeling tasks which are not completed by the first terminal and the average value meet preset conditions, judging that the tasks to be completed need to be redistributed.
Optionally, the task list further includes second progress information, and reassigning the to-be-completed task to the first type terminal includes: reading second progress information of the first type terminal, wherein the second progress information is the number of training data marking tasks completed within a preset time; and dividing the task to be completed according to the second progress information of the first type terminal so as to update the training data labels needing to be completed corresponding to the first type terminal.
Optionally, the task execution terminal has a type label, and for each type of training data labeling task, identifying the task execution terminal matched with the type includes: and for each type of training data labeling task, searching for a task execution terminal with a type label and/or a blank label which are the same as the type so as to obtain a task execution terminal matched with the type.
Optionally, before dividing the task to be completed according to the second progress information of the first type terminal, the method further includes: and if the type label of at least one task execution terminal in the first type terminal is a blank label and the second progress information of the at least one task execution terminal reaches a second preset threshold, modifying the type label of the at least one task execution terminal into a type label identical to the first type, wherein the first type is a type matched with the first terminal.
Optionally, if the type tag of at least one task execution terminal in the first type terminal is a non-blank tag and the second progress information of the at least one task execution terminal is smaller than the second preset threshold, the type tag of the at least one task execution terminal is removed, where the first type is a type matched with the first terminal. In order to solve the above technical problem, an embodiment of the present invention further provides an apparatus for allocating training data labeling tasks, where the apparatus includes: the receiving module is used for receiving task creating information, and the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task; the identification module is used for labeling a task for each type of training data and identifying a task execution terminal matched with the type; and the distribution module is used for distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task.
The embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for allocating training data tagging tasks.
The embodiment of the present invention further provides a computing device, which includes a memory and a processor, where the memory stores a computer program capable of running on the processor, and the processor executes the step of the method for allocating training data tagging tasks when running the computer program.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects: the embodiment of the invention provides a method for distributing training data labeling tasks, which comprises the following steps: receiving task creating information, wherein the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task; marking a task for each type of training data, and identifying a task execution terminal matched with the type; and distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task. In the embodiment of the invention, the received task creating information comprises the types of the training data marking tasks, the data marking platform can respectively search the matched task execution terminals according to the types of the training data marking tasks, and when the training data marking tasks are distributed, the training data marking tasks can be distributed to the task execution terminals matched with the types of the tasks, so that a large number of training data marking tasks can be efficiently distributed to the proper task execution terminals, and the marking efficiency is improved.
Further, in the embodiment of the present invention, the data annotation platform may further monitor the first progress information of each task execution terminal, and when the first progress information of any task execution terminal reaches a first preset threshold, that is, when the completion progress of any task execution terminal is faster, further compare the progress of the task execution terminal with the average progress of all task execution terminals matching the same type, and if the two satisfy a preset condition, that is, when the difference between the two is larger, reallocate the tasks that are not completed by all task execution terminals to the matched task execution terminals. Therefore, the scheme of the embodiment of the invention can dynamically adjust the number of the tasks distributed to each task execution terminal according to the completion progress, so that the distribution of the training data labeling tasks is more reasonable, and the labeling efficiency can be further improved.
Drawings
Fig. 1 is a schematic application scenario diagram of an allocation method for training data labeling tasks in an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a method for allocating training data labeling tasks according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an apparatus for allocating training data labeling tasks according to an embodiment of the present invention.
Detailed Description
As described in the background art, there is a need for a method for allocating training data annotation tasks, which can efficiently allocate a large number of training data annotation tasks, thereby further improving the efficiency of completing the training data annotation tasks.
The inventor of the present invention finds, through research, that in the prior art, an administrator of a data annotation platform generally determines annotators of training data annotation tasks in sequence, and each annotator of a data annotation execution party needs to find the training data annotation task allocated by the administrator from a large number of training data annotation tasks and retrieve the task on the data annotation platform and then perform annotation. Because the number of the annotators and the number of the training data annotation tasks are large, a lot of time is consumed when the administrator allocates the training data annotation tasks; in addition, when receiving the training data tagging tasks, the annotators also need to search for tasks allocated by the administrator under a large number of training data tagging tasks, and it is easy to receive wrong tasks due to the fact that the names of the training data tagging tasks are close, so that urgent training data tagging tasks are delayed. Therefore, the distribution process of the method is complicated and complicated, and the method is seriously dependent on manual operation.
In order to solve the above technical problem, an embodiment of the present invention provides a method for allocating training data tagging tasks, where the method includes: receiving task creating information, wherein the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task; marking a task for each type of training data, and identifying a task execution terminal matched with the type; and distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task. In the embodiment of the invention, the received task creating information comprises the types of the training data marking tasks, the data marking platform can respectively search the matched task execution terminals according to the types of the training data marking tasks, and when the training data marking tasks are distributed, the training data marking tasks can be distributed to the task execution terminals matched with the types of the tasks, so that a large number of training data marking tasks can be efficiently distributed to the proper task execution terminals, and the marking efficiency is improved.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an allocation method for a training data labeling task in an embodiment of the present invention. The method may be executed by the data annotation platform 11, where the data annotation platform 11 may be a platform of a data annotation demander (not shown) itself, and the data annotation demander may be a scientific and technological company in the fields of artificial intelligence and the like, but is not limited thereto, and the data annotation platform 11 may also be a third-party platform independent from the data annotation demander and a plurality of data annotation executing parties (for example, an executing subject dedicated to data annotation), but is not limited thereto.
Further, the data annotation platform 11 may include at least one server (not shown), and the data annotation platform 11 is coupled to a database server (not shown), and the database server is used for storing the training data, which may be an internal server of the data annotation demanding party, or other servers storing the training data, but is not limited thereto. The data annotation platform 11 may obtain training data to be annotated from the database server, create a plurality of training data annotation tasks for the training data to be annotated, and then allocate the plurality of training data annotation tasks to the task execution terminals 12 of a plurality of data annotation execution parties for annotation. When the training data is labeled, the training data may be automatically labeled by a plurality of task execution terminals 12, or the training data may be labeled by manually operating the task execution terminals 12, where the task execution terminals 12 may be terminals such as a mobile phone, a computer, and a tablet computer.
Fig. 2 is a schematic flow chart of an allocation method for training data labeling tasks according to an embodiment of the present invention, where the allocation method for training data labeling tasks shown in fig. 2 may include the following steps:
step S201: receiving task creating information, wherein the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task;
step S202: marking a task for each type of training data, and identifying a task execution terminal matched with the type;
step S203: and distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task.
In the specific implementation of step S201, the task creating information received by the data annotation platform may be sent by the data annotation demanding party, where the task creating information may indicate a plurality of training data annotation tasks and a type of each training data annotation task, and the type of the training data annotation task is determined by the type of the training data corresponding to the task, and may be a picture, a text, a video, and the like. For example, if the training data is a picture, the type of the corresponding training data labeling task is a picture. The plurality of training data labeling tasks may include at least one type of training data labeling task, for example, the training data to be labeled by the data labeling demander may include both picture data and text data.
Specifically, when a data labeling demander needs to label training data, task creation information is sent to a data labeling platform, where the task creation information may include the training data to be labeled and the type (text, picture, voice, etc.) of the training data; the task creation information may also include a storage address of training data to be labeled, a type of the training data, and a quantity of each type of training data, and the data labeling platform obtains the training data from the database server according to the type and the quantity of the training data, but is not limited thereto. In one non-limiting embodiment of the present invention, the task creation information may further include at least one task execution terminal specified by the data annotation demander.
Further, the data annotation platform creates a plurality of training data annotation tasks according to the received task creation information, each training data annotation task may include a first preset number of training data of the same type, and the type of the first preset number of training data is the type of the training data annotation task. The first preset quantity may be predetermined by the data annotation platform, or may be specified by the data annotation demander, for example, the task creation information may include the first preset quantity.
In the specific implementation of step S202, each task execution terminal may have at least one type tag, the data annotation platform has a terminal list, the terminal list may include a terminal identifier and a type tag of each task execution terminal of the data annotation executing party, the terminal identifier corresponds to the task execution terminal one to one, that is, the terminal identifier may uniquely determine the task execution terminal. The type tag may indicate that the task execution terminal is good at labeling the data type, for example, if the type tag of the task execution terminal a in the terminal list is "picture", it indicates that the task execution terminal a is good at labeling the picture data, that is, the accuracy and/or efficiency of the task execution terminal a is higher when labeling the picture data. The type tag may also be a blank tag, and is used to indicate that the data type of the task execution terminal that is adept at labeling is uncertain, for example, if the terminal list does not include the type tag of the task execution terminal B, the type tag of the task execution terminal B is considered to be a blank tag.
It should be noted that, when the task execution terminal has multiple types of tags, the task execution terminal may match multiple types of training data tagging tasks.
Further, for each type of training data tagging task, the data tagging platform may search, according to the type, a task execution terminal having a type tag that is the same as the type in the terminal list, that is, the task execution terminal having the type tag that is the same as the type is used as a task execution terminal matched with the type. For example, if the type of the training data tagging task is a picture, the data tagging platform may find the task execution terminal with the type tag being the picture in the terminal list.
In a non-limiting embodiment of the present invention, the task execution terminal having the blank tag may also be regarded as a task execution terminal matching the type, and the task execution terminal matching the type includes a task execution terminal having a type tag and/or a blank tag that is the same as the type.
It should be clear that, if the task creation information includes at least one task execution terminal specified by the data annotation demander, the at least one task execution terminal specified by the data annotation demander may also serve as a matched task execution terminal.
In the specific implementation of step S203, each task execution terminal has a corresponding task list, and the task list may include, but is not limited to, a task number and state information of a training data tagging task. The task number may be used to uniquely determine the training data annotation task. The state information may be used to indicate whether the task execution terminal completes the training data tagging task. For each training data labeling task, if the task execution terminal labels all training data corresponding to the task, the state information of the training data labeling task is 'completed', otherwise, the state information of the training data labeling task is 'incomplete', and the sum of the number of the uncompleted training data labeling tasks and the number of completed training data labeling tasks in the task list is the number of the training data labeling tasks required to be completed by the task execution terminal.
Further, for each type of training data labeling task, after the data labeling platform is matched with the corresponding task execution terminal, the training data labeling task to be completed by each task execution terminal can be determined, wherein the task distributed to each task execution terminal by the data labeling platform is the training data labeling task to be completed by the terminal.
Specifically, for each type of training data labeling task, the data labeling platform may divide the type of training data labeling task into multiple parts according to the number of task execution terminals matched with the type, so as to obtain training data labeling tasks to be completed by each task execution terminal; and for each task execution terminal, writing the task number of the training data labeling task which needs to be completed by the terminal into a task list of the terminal.
In one non-limiting embodiment of the present invention, for each type of training data annotation task, the type of training data annotation task may be evenly distributed to each task execution terminal matching the type. For example, if there are M training data tagging tasks of the picture type and N matched task execution terminals, the M training data tagging tasks are averagely allocated to the N task execution terminals, that is, the training data tagging tasks of the picture type that each task execution terminal needs to complete are M/N, and the task numbers of the M/N training data tagging tasks are respectively written into the task lists of the N task execution terminals.
Further, the terminal list may further include labeling speed information of each task execution terminal, where the labeling speed information may be used to indicate a speed at which the task execution terminal labels the training data, for example, the labeling speed information may be time taken by the task execution terminal to complete a second preset number of training data labeling tasks, where the second preset number may be predetermined by the data labeling platform. Therefore, the data annotation platform can also be divided according to the annotation speed information of each task execution terminal, namely, the faster the annotation speed of the task execution terminal, the more training data annotation tasks are allocated, so that the allocation of the training data annotation tasks is more reasonable, and the annotation efficiency can be improved.
Further, for each task execution terminal, after the data annotation platform adds the training data annotation task to be completed to the task list, the data annotation platform can monitor the state information of the task execution terminal, and when the task execution terminal is annotating the training data, the state information can be 'busy green'; when the task execution terminal completes any training data labeling task and does not start the next training data labeling task, the state information of the task execution terminal can be idle. When the data annotation platform monitors that the state information indicates that the task execution terminal is idle, any one uncompleted training data annotation task can be selected from a task list of the task execution terminal as a next training data annotation task; and sending the training data corresponding to the next training data labeling task to the task execution terminal for labeling.
Further, the task list may further include first progress information, where the first progress information is a ratio of the number of training data tagging tasks completed by the task execution terminal to the number of tasks that the task execution terminal needs to complete. In the process of labeling the training data by the task execution terminal, the data labeling platform may also monitor the first progress information of each task execution terminal, for example, the data labeling platform may periodically read the first progress information of each task execution terminal.
In one non-limiting embodiment of the present invention, each task execution terminal has a unique type tag, that is, the training data in the task list marks the same type of task. In this scenario, if it is monitored that the first progress information of any task execution terminal reaches a first preset threshold, the data annotation platform marks the task execution terminal of which the first progress information reaches the first preset threshold as the first terminal, and marks the type of the training data annotation task matched with the first terminal as the first type. The first preset threshold may be predetermined, or may be determined by real-time calculation of the data annotation platform, for example, may be determined according to the annotation speed information of each task execution terminal.
It should be clear that, as described above, the task execution terminal may have at least one type tag, and when the task execution terminal has a plurality of type tags and is allocated to a plurality of types of training data tagging tasks, a task list of the terminal includes the plurality of types of training data tagging tasks, and then the task list may include a plurality of pieces of first progress information, where the first progress information corresponds to the types of the training data tagging tasks one to one. For example, the task list of the task execution terminal includes a picture-type training data labeling task and a text-type training data labeling task, and the task list includes first progress information of a picture type and first progress information of a text type.
It should be clear that, if the task list of the first terminal includes a plurality of pieces of first progress information, the first type is a type corresponding to the first progress information that reaches a first preset threshold.
Further, the data annotation platform can determine whether to reallocate the task to be completed. Specifically, the data annotation platform reads a task list of each task execution terminal in the first type terminal to determine the training data annotation tasks that are not completed by the first type terminal, and records the training data annotation tasks as to-be-completed tasks, wherein the first type terminal refers to all task execution terminals (including the first terminal) matched with the first type terminal. For example, if the first type is a picture, reading all unfinished training data labeling tasks of the task execution terminal matched with the picture type.
Further, according to the number of the first type terminals and the number of the tasks to be completed, calculating a mean value of the number of the tasks to be completed, wherein the mean value is a result of dividing the number of the tasks to be completed by the number of the first type terminals. If the average value and the number of the unfinished first type tasks of the first terminal meet the preset condition, the task to be finished needs to be redistributed.
As a non-limiting example, the preset condition may be that a difference value obtained by subtracting the average value from the number of the first type of tasks that the first terminal does not complete is greater than a preset difference value, or that the number of the first type of tasks that the first terminal does not complete exceeds a preset multiple of the average value (e.g., exceeds twice the average value), but is not limited thereto. The preset difference value and the preset multiple can be predetermined by the data annotation platform, and can also be determined by the data annotation platform through real-time calculation.
Further, if the task to be completed needs to be redistributed, the data annotation platform can redistribute the task to be completed to the first type terminal.
Further, the data annotation platform can also adjust the first type of terminal before redistributing the tasks to be completed. Specifically, the task list may further include second progress information, where the second progress information may be used to indicate a speed at which the task execution terminal completes the training data tagging task, for example: and the second progress information is the number of training data marking tasks completed by the task execution terminal within the preset time. If the second progress information of any one task execution terminal in the first type terminal is smaller than a second preset threshold, it is judged that the task execution terminal has a low speed for completing the first type of training data marking task and is not suitable for marking the first type of training data, and therefore the matching relation between the task execution terminal and the first type can be cancelled, namely the task execution terminal is removed from the first type terminal, and therefore the first type terminal is adjusted. The preset time may be preset by the data annotation platform, and may be 1 hour, for example. The second preset threshold value can be predetermined by the data annotation platform, and can also be calculated and determined by the data annotation platform in real time. The second progress information may be periodically updated.
Further, if the type tag of any one task execution terminal in the first type terminal is not a blank tag and the second progress information of the task execution terminal is smaller than a second preset threshold, the data annotation platform may delete the type tag of the terminal that is the same as the first type from the terminal list, and in the subsequent reallocation process, the task to be completed is not allocated to the task execution terminal.
It should be noted that, if the task list includes multiple types of training data tagging tasks, the task list may include multiple pieces of second progress information, and the second progress information corresponds to the types of the training data tagging tasks one to one. The second progress information is the second progress information corresponding to the first type.
Further, if the type label of any one task execution terminal in the first type terminal is a blank label and the second progress information of the task execution terminal reaches a second preset threshold, the type label of the terminal may be modified into a type label identical to the first type in the terminal list, so that the type of the training data which the task execution terminal excels in labeling can be determined, and the task of the training data labeling whose types are not matched can be prevented from being allocated to the terminal in the subsequent task allocation.
Further, after the data annotation platform adjusts the first type terminal, the to-be-completed task is redistributed to the adjusted first type terminal. The data annotation platform can averagely distribute the tasks to be completed to the first type terminals, and can also divide the tasks to be completed according to the second progress information of the first type terminals so as to obtain the tasks to be completed distributed to each terminal in the first type terminals, wherein the more the number of the first type tasks completed within the preset time is, the more the number of the distributed tasks to be completed is. For each terminal in the first type of terminal, the task number of the unfinished training data labeling task of the first type may be deleted from the task list, and the newly allocated task number of the task to be finished is written in, so as to update the training data labeling task that needs to be finished by the first type of terminal.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for assigning training data labeling tasks according to an embodiment of the present invention. The device for distributing the training data labeling tasks can comprise: a receiving module 31, an identifying module 32, and an assigning module 33.
The receiving module 31 is configured to receive task creation information, where the task creation information indicates a plurality of training data tagging tasks and a type of each training data tagging task; the identification module 32 is configured to label a task for each type of training data, and identify a task execution terminal matched with the type; the allocating module 33 is configured to allocate each type of training data labeling task to the task execution terminal matched with the training data labeling task.
For more details of the working principle and the working mode of the distribution device for training data labeling tasks, reference may be made to the above description related to fig. 1 and fig. 2, which is not repeated herein.
The embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the distribution method for the training data labeling task. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile (non-volatile) memory, a non-transient (non-transient) memory, or the like.
The embodiment of the present invention further provides a computing device, where the computing device includes a memory and a processor, where the memory stores a computer program capable of running on the processor, and the processor executes the steps of the method for training data tagging task when running the computer program. The computing device may be a server or the like.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.
The "plurality" appearing in the embodiments of the present application means two or more.
The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (12)

1. A method for distributing training data labeling tasks is characterized by comprising the following steps:
receiving task creating information, wherein the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task;
marking a task for each type of training data, and identifying a task execution terminal matched with the type;
and distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task.
2. The method for distributing the training data labeling tasks according to claim 1, wherein distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task comprises:
for each type of training data labeling task, dividing the type of training data labeling task according to the number of task execution terminals matched with the type to obtain the training data labeling tasks required to be completed by each task execution terminal;
and for each task execution terminal, adding the training data marking task which needs to be completed to the task list of the task execution terminal.
3. The method for assigning training data labeling tasks according to claim 2, wherein assigning each type of training data labeling task to the task execution terminal matched with the training data labeling task further comprises:
monitoring the state information of each task execution terminal, and selecting the next training data labeling task from the task list when the state information indicates that the task execution terminal is in an idle state;
and sending the training data corresponding to the next training data labeling task to the task execution terminal.
4. The method for allocating training data tagging tasks according to claim 2, wherein the task list includes first progress information, and the first progress information is a ratio of the number of completed training data tagging tasks to the number of training data tagging tasks that need to be completed, and the method further includes:
monitoring first progress information of each task execution terminal;
when first progress information of any task execution terminal reaches a first preset threshold value, judging whether the task to be completed needs to be redistributed, wherein the task to be completed is a training data marking task which is not completed by a first type terminal, the first type terminal is all task execution terminals which are matched with the first type terminal and have the same type, and the first terminal is a task execution terminal of which the first progress information reaches the first preset threshold value;
and if so, reallocating the task to be completed to the first type terminal.
5. The method for allocating training data labeling tasks according to claim 4, wherein the step of determining whether the tasks to be completed need to be reallocated comprises the steps of:
counting the number of the tasks to be completed, and calculating the average value of the number of the tasks to be completed according to the number of the first type terminals;
and comparing the number of the training data labeling tasks which are not completed by the first terminal with the average value, and if the number of the training data labeling tasks which are not completed by the first terminal and the average value meet preset conditions, judging that the tasks to be completed need to be redistributed.
6. The method for allocating training data labeling tasks according to claim 4, wherein the task list further includes second progress information, and the re-allocating the to-be-completed task to the first type terminal includes:
reading second progress information of the first type terminal, wherein the second progress information is the number of training data marking tasks completed within a preset time;
and dividing the task to be completed according to the second progress information of the first type terminal so as to update the training data labels needing to be completed corresponding to the first type terminal.
7. The method according to claim 6, wherein the task execution terminal has a type label, and for each type of training data labeling task, identifying the task execution terminal matching the type comprises:
and for each type of training data labeling task, searching for a task execution terminal with a type label and/or a blank label which are the same as the type so as to obtain a task execution terminal matched with the type.
8. The method for allocating training data annotation tasks according to claim 7, wherein before dividing the tasks to be completed according to the second progress information of the first type terminal, the method further comprises:
and if the type label of at least one task execution terminal in the first type terminal is a blank label and the second progress information of the at least one task execution terminal reaches a second preset threshold, modifying the type label of the at least one task execution terminal into a type label identical to the first type, wherein the first type is a type matched with the first terminal.
9. The method for allocating training data annotation tasks according to claim 8, wherein before dividing the tasks to be completed according to the second progress information of the first type terminal, the method further comprises:
and if the type label of at least one task execution terminal in the first type terminal is a non-blank label and the second progress information of the at least one task execution terminal is smaller than the second preset threshold value, rejecting the type label of the at least one task execution terminal.
10. An apparatus for assigning training data labeling tasks, the apparatus comprising:
the receiving module is used for receiving task creating information, and the task creating information indicates a plurality of training data labeling tasks and the type of each training data labeling task;
the identification module is used for labeling a task for each type of training data and identifying a task execution terminal matched with the type;
and the distribution module is used for distributing each type of training data labeling task to the task execution terminal matched with the training data labeling task.
11. A storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, performs the steps of the method for assigning training data annotation tasks according to any of the claims 1 to 9.
12. A computing device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the method of assigning training data annotation tasks according to any of claims 1 to 9.
CN202011364814.5A 2020-11-27 2020-11-27 Distribution method and device of training data labeling tasks and computing equipment Pending CN112465032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011364814.5A CN112465032A (en) 2020-11-27 2020-11-27 Distribution method and device of training data labeling tasks and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011364814.5A CN112465032A (en) 2020-11-27 2020-11-27 Distribution method and device of training data labeling tasks and computing equipment

Publications (1)

Publication Number Publication Date
CN112465032A true CN112465032A (en) 2021-03-09

Family

ID=74809825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011364814.5A Pending CN112465032A (en) 2020-11-27 2020-11-27 Distribution method and device of training data labeling tasks and computing equipment

Country Status (1)

Country Link
CN (1) CN112465032A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408745A (en) * 2021-08-20 2021-09-17 北京瑞莱智慧科技有限公司 Task scheduling method, device, equipment and storage medium
CN113569546A (en) * 2021-06-16 2021-10-29 上海淇玥信息技术有限公司 Intention labeling method and device and electronic equipment
CN113822137A (en) * 2021-07-23 2021-12-21 腾讯科技(深圳)有限公司 Data annotation method, device and equipment and computer readable storage medium
CN116702885A (en) * 2023-08-02 2023-09-05 浪潮电子信息产业股份有限公司 Synchronous data parallel training control method, system, device, equipment and medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569546A (en) * 2021-06-16 2021-10-29 上海淇玥信息技术有限公司 Intention labeling method and device and electronic equipment
CN113822137A (en) * 2021-07-23 2021-12-21 腾讯科技(深圳)有限公司 Data annotation method, device and equipment and computer readable storage medium
CN113408745A (en) * 2021-08-20 2021-09-17 北京瑞莱智慧科技有限公司 Task scheduling method, device, equipment and storage medium
CN116702885A (en) * 2023-08-02 2023-09-05 浪潮电子信息产业股份有限公司 Synchronous data parallel training control method, system, device, equipment and medium
CN116702885B (en) * 2023-08-02 2023-11-07 浪潮电子信息产业股份有限公司 Synchronous data parallel training control method, system, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN112465032A (en) Distribution method and device of training data labeling tasks and computing equipment
CN109034188B (en) Method and device for acquiring machine learning model, equipment and storage medium
CN105550345A (en) File operation method and apparatus
CN106708443B (en) Data reading and writing method and device
US20170212930A1 (en) Hybrid architecture for processing graph-based queries
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN113032105A (en) Kubernetes cluster access control method, system and related equipment
CN112598366B (en) Automatic planning method and device for asset management approval process
EP2662783A1 (en) Data archiving approach leveraging database layer functionality
CN115062676B (en) Data processing method, device and computer readable storage medium
CN108205559A (en) A kind of data managing method and its equipment
CN110825953A (en) Data query method, device and equipment
CN111552575A (en) Message queue-based message consumption method, device and equipment
CN111400056A (en) Message queue-based message transmission method, device and equipment
CN112052330B (en) Application keyword distribution method and device
CN111309821B (en) Task scheduling method and device based on graph database and electronic equipment
CN112579539A (en) Management method and system for enterprise cluster big data
CN115248831B (en) Labeling method, labeling device, labeling system, labeling equipment and readable storage medium
CN110825959B (en) Data transmission method and selection method and device of list data acquisition model
CN111324792A (en) Big data platform
US20230067107A1 (en) Managing vertex level access in a graph via user defined tag rules
CN112035232B (en) Job operation priority determining method and related equipment
CN113326888B (en) Labeling capability information determining method, related device and computer program product
CN109753246B (en) Hybrid heterogeneous memory-oriented tagged data and job scheduling method and system
CN114816731A (en) Method for distributing test tasks, computer readable storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210816

Address after: 200080 7th floor, No.137 Haining Road, Hongkou District, Shanghai

Applicant after: Shanghai Xinyi Intelligent Technology Co.,Ltd.

Address before: 100190 1008, 10th floor, building 51, 63 Zhichun Road, Haidian District, Beijing

Applicant before: Beijing Xinyi Intelligent Information Technology Co.,Ltd.

TA01 Transfer of patent application right