WO2021139346A1 - 数据标注系统、计算机可读存储介质及电子设备 - Google Patents

数据标注系统、计算机可读存储介质及电子设备 Download PDF

Info

Publication number
WO2021139346A1
WO2021139346A1 PCT/CN2020/124738 CN2020124738W WO2021139346A1 WO 2021139346 A1 WO2021139346 A1 WO 2021139346A1 CN 2020124738 W CN2020124738 W CN 2020124738W WO 2021139346 A1 WO2021139346 A1 WO 2021139346A1
Authority
WO
WIPO (PCT)
Prior art keywords
account
labeling
data
module
administrator
Prior art date
Application number
PCT/CN2020/124738
Other languages
English (en)
French (fr)
Inventor
巢中迪
庄伯金
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139346A1 publication Critical patent/WO2021139346A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the technical field of blockchain smart contract technology, and in particular to a data labeling system, computer-readable storage medium and electronic equipment.
  • the purpose of this application is to provide a data labeling system, a computer-readable storage medium, and an electronic device.
  • a data labeling system includes: an account management module for maintaining the account of the data labeling system and the permissions corresponding to each account, the data labeling system
  • the account includes an administrator account and an annotator account.
  • the authority of the administrator account includes creating an annotation task, and the authority of the annotator account includes processing an annotation task; the administrator module is used to create an annotation according to the instructions of the administrator account.
  • data receiving module for receiving target data corresponding to the labeling task uploaded by the administrator account; automatic labeling module, including multiple data labeling models, each data labeling model is used to process and label the data The labeling task matched by the model is used to label the target data corresponding to the labeling task to obtain the labeling result of the target data; the labeler module is used to provide the labeler account with the labelled by the automatic labeling module The target data and the corresponding labeling result, so that the labeler account processes the labeling task by reviewing the labeling result of the labelled target data, and receives the review result returned by the labeler account Or provide the unlabeled target data to the labeler account, so that the labeler account processes the labeling task by labeling the unlabeled target data, and receives the labeler account Returned annotation result; a sending module for sending the target data and the review result and/or annotation result corresponding to each target data to the administrator account.
  • a computer-readable storage medium which stores computer program instructions, and when the computer program instructions are executed by a computer, the computer realizes the aforementioned data labeling system, wherein:
  • the data labeling system includes: an account management module, which is used to maintain the account of the data labeling system and the corresponding authority of each account.
  • the account of the data labeling system includes an administrator account and an annotator account.
  • the permissions include creating labeling tasks, and the permissions of the labeler account include processing labeling tasks; the administrator module is used to create labeling tasks according to the instructions of the administrator account; the data receiving module is used to receive uploads from the administrator account The target data corresponding to the labeling task; the automatic labeling module includes a plurality of data labeling models, and each data labeling model is used to process the labeling task matching the data labeling model to perform the target data corresponding to the labeling task Labeling to obtain the labeling result of the target data; the labeler module is used to provide the target data and the corresponding labeling result that have been labelled by the automatic labeling module to the labeler account, so that the labeler account Process the labeling task by reviewing the labeling result of the labeled target data, and receive the review result returned by the labeler account; or provide the unlabeled target data to the labeler account , So that the labeler account processes the labeling task by labeling the unlabeled target data, and receives the labeling result returned by the labeler account;
  • an electronic device including:
  • a memory where computer-readable instructions are stored on the memory, and when the computer-readable instructions are executed by the processor, the data labeling system as described above is realized, wherein the data labeling system includes: an account management module, The account used to maintain the data labeling system and the corresponding authority of each account, the account of the data labeling system includes an administrator account and an annotator account, the authority of the administrator account includes creating an annotation task, the annotator account Permissions include processing labeling tasks; an administrator module for creating labeling tasks according to instructions from the administrator account; a data receiving module for receiving target data corresponding to the labeling tasks uploaded by the administrator account; The automatic labeling module includes multiple data labeling models, and each data labeling model is used to process labeling tasks matching the data labeling model to label the target data corresponding to the labeling task to obtain the labeling result of the target data Annotator module for providing the target data and corresponding annotation results that have been marked by the automatic annotation module to the annotator account, so that the annotator account can pass on the marked target data To process the label
  • the data labeling system is constructed and the automatic labeling module in the data labeling system is used to label the target data, thereby improving the efficiency of data labeling and The cost of data labeling is reduced; in addition, because the labeler module of the data labeling system can also pass the labeling result of the target data by the automatic labeling module to the labeler account for review, the efficiency of data labeling and the accuracy of data labeling are achieved. Take care of.
  • Fig. 1 is a schematic diagram showing an application architecture of a data labeling system according to an exemplary embodiment
  • Fig. 2 is a schematic diagram showing a system architecture of a data labeling system according to an exemplary embodiment
  • Fig. 3 is a schematic diagram showing a classification of multi-level tags and single-level tags according to an exemplary embodiment
  • Fig. 4 is a block diagram showing an example of an electronic device implementing the above-mentioned data labeling system according to an exemplary embodiment
  • Fig. 5 shows a program product for implementing the above-mentioned data labeling system according to an exemplary embodiment.
  • Data labeling refers to the process of labeling data to establish corresponding labeling information or labeling results for these data.
  • the label on a piece of data is usually the feature or attribute of the data, and these features or attributes can be used to understand the item data. For example, a photo of a face can be labeled with gender or age.
  • labeled photos can be used to train the corresponding type of machine learning model, for example, labeled with Gender-labeled face photos can be used to train machine learning models for recognizing gender based on faces, and age-labeled face photos can be used to train machine learning models for recognizing age based on faces; therefore, the data Labeling is a very important task in the field of machine learning and artificial intelligence, and the data labeling system provided in this application provides an efficient tool for data labeling.
  • the implementation terminal of this application can be any device with computing, processing, and communication functions.
  • the device can be connected to an external device to receive or send data.
  • it can be a portable mobile device, such as a smart phone, a tablet computer, a notebook computer, PDA (Personal Digital Assistant), etc., can also be fixed devices, such as computer equipment, field terminals, desktop computers, servers, workstations, etc., or a collection of multiple devices, such as cloud computing physical infrastructure or server clusters .
  • the implementation terminal of this application may be a server or a physical infrastructure of cloud computing.
  • Fig. 1 is a schematic diagram showing an application architecture of a data labeling system according to an exemplary embodiment.
  • the application architecture includes a server 110, an administrator terminal 120, and an annotator terminal 130.
  • the administrator terminal 120 and the annotator terminal 130 are all connected to the server 110 through a communication link. Through the communication link, the administrator The terminal 120 and the labeler terminal 130 can receive data sent by the server 110, and can also send data to the server 110.
  • the server 110 runs a data labeling system. Therefore, in this embodiment, the server 110 is the implementation terminal of the application.
  • an application method may be as follows: an annotator client corresponding to the data labeling system is installed on the annotator terminal 130, and the management
  • the administrator terminal 120 is installed with an administrator client corresponding to the data labeling system.
  • the data labeling system on the server 110 maintains the administrator account and the labeler’s account as well as the permissions corresponding to the two accounts.
  • the administrator operates the administrator client At the end, it communicates with the administrator module of the data labeling system to create a labeling task, and then communicates with the data receiving module of the data labeling system through the operation administrator client to upload the target data belonging to the labeling task; then, the data labeling system
  • the automatic labeling module of the data labeling model determines the data labeling model that matches the labeling task among multiple data labeling models, and uses the data labeling model to label the target data to generate labeling results; next, the labeler module of the data labeling system will Communicate with the labeler client to push the labelled target data and corresponding labeling results to the labeler’s account, and receive the review result returned by the labeler’s account; finally, the sending module of the data labeling system will combine the target data with the The review result corresponding to the target data is sent to the administrator account to complete the data labeling work.
  • Fig. 1 is only an embodiment of the present application.
  • the implementation terminal in this embodiment is a server, in other embodiments, the implementation terminal may be various terminals or devices as described above; although in this embodiment, the various modules of the data labeling system are located in the same terminal. However, in other embodiments, the modules may be located on different terminals.
  • the administrator account and the annotator account communicate with the data annotation system through the administrator client and the annotator client, respectively, and the administrator client and the annotator client are located outside the local terminal.
  • the administrator client and/or the annotator client can be located on the same terminal including the local terminal, and the administrator account and the annotator account are not limited to the way through the client
  • this application does not make any limitation on this, and the scope of protection of this application should not be restricted in any way.
  • Fig. 2 is a schematic diagram showing a system architecture of a data labeling system according to an exemplary embodiment.
  • the data labeling system provided in this embodiment can be implemented and executed by a server.
  • the data labeling system 200 includes:
  • the account management module 210 is configured to maintain the account of the data labeling system and the corresponding authority of each account.
  • the account of the data labeling system includes an administrator account and an annotator account, and the authority of the administrator account includes creating an annotation task,
  • the authority of the labeler account includes processing labeling tasks.
  • Creating a labeling task is the process of establishing a labeling task that can be executed, which can include specific steps such as entering task information and starting the task process.
  • the processing and labeling task is an actual process related to data labeling, which may include labeling data, for example.
  • the administrator module 220 is configured to create a labeling task according to the instructions of the administrator account.
  • the specific actions performed by the administrator module can be as follows: push a page for creating annotated tasks to users using an administrator account through the front end. There are buttons for creating tasks and an entry box for entering task information on the page. Use After the user of the administrator account enters the task information in the entry box for entering the task information, they can create an annotation task by clicking the button to create the task.
  • the data receiving module 230 is configured to receive target data corresponding to the labeling task uploaded by the administrator account.
  • the target data can be various data that can be annotated and used to train a machine learning model, such as image data, voice data, text data, and so on.
  • the corresponding labeling task can be to label the gender of the face in the image; if the target data is voice data, then the corresponding labeling task can be to label the content expressed by voice.
  • the administrator module is further configured to delete the target data that has been uploaded by the administrator account according to the instructions of the administrator account.
  • the administrator account is allowed to delete the target data that has been uploaded by it, which protects user privacy.
  • the automatic labeling module 240 includes a plurality of data labeling models, and each data labeling model is used to process a labeling task matching the data labeling model to label the target data corresponding to the labeling task to obtain the label of the target data result.
  • any two data labeling models in the multiple data labeling models can be similar data labeling models or data labeling models with very different data.
  • the two data labeling models may both be models for labeling image data, and may be models for labeling image data and voice data, respectively.
  • the administrator module is further used for:
  • the information corresponding to each data labeling model in the automatic labeling module is sent to the administrator account, and after the information selected by the administrator account is obtained, the data labeling model corresponding to the information is used as A data labeling model that matches the labeling task.
  • the user experience is improved by allowing the administrator to independently select the data labeling model used for labeling tasks.
  • the information corresponding to a data labeling model can include the name and function description of the data labeling model (for example, the name is gender labeling model, and the function description is to label the gender of the person in the picture), then when the label created by the administrator account When the task needs to label the gender of the person in the image data, the information can be selected, so that the data labeling model corresponding to the information is used as the data labeling model that matches the labeling task.
  • the name is gender labeling model
  • the function description is to label the gender of the person in the picture
  • the administrator module is further configured to: obtain the labeling task description information uploaded by the administrator account when the administrator account creates the labeling task;
  • the model matching module is configured to determine a data labeling model matching the labeling task among the multiple data labeling models of the automatic labeling module according to the labeling task description information.
  • the automatic labeling module also includes model description information corresponding to each data labeling model; the similarity between the labeling task description information and the description information of each model can be determined, and then the corresponding model description information with the greatest similarity can be determined , And use the data labeling model corresponding to the model description information as the data labeling model matching the labeling task.
  • the annotator module 250 is configured to provide the target data that has been annotated by the automatic annotation module and the corresponding annotation result to the annotator account, so that the annotator account can pass on the marked target data To review the labeling result of to process the labeling task, and receive the review result returned by the labeler’s account; or
  • the labeling result corresponding to the target data is the label of the target data or the labeling information corresponding to the target data.
  • the target data that has been marked by the automatic labeling module and the corresponding labeling results can be provided to the labeler’s account by the labeler module of the data labeling system by actively pushing it to the labeler’s account; it can also be provided to the labeler’s account by the labeler’s account through active labeling from data The labeler module of the system pulls, so as to realize the target data and the corresponding labeling results are provided to the labeler account.
  • the labeling result may be, for example, a label
  • the review result may be, for example, a new labeling result obtained by revising after judging whether the labeling result is correct.
  • the labeler module is further configured to push a task list to the labeler account, the task list includes the labeling task, wherein the automatic labeling module is pushed to the labeler account.
  • the target data and the corresponding labeling result or pushing the unlabeled target data to the labeler account is performed when the labeling task in the task list is triggered.
  • the annotator module can push a page to the annotator account.
  • the task list contained in the page is a button corresponding to at least one task.
  • the button can be triggered.
  • the button When the button is triggered by the annotator account, it responds to the trigger and sends The annotator account initiates a push.
  • the labeler account can process the labeling task by labeling the unlabeled target data.
  • the labeling task is completely performed by humans. Ensure that the labeling maintains a high accuracy rate; provide the target data that has been marked by the automatic labeling module and the corresponding labeling results to the labeler account, so that the labeler account can process the labeling task by reviewing the labeling results of the marked target data. In this way, users only need to review the labeling results of the target data to process the labeling task.
  • Most of the labeling tasks for the target data are automatically completed by the model, and the processing of the entire labeling task is completed by man-machine collaboration, thereby improving Improved labeling efficiency.
  • the review results and/or annotation results in the above data annotation system can also be stored in a blockchain node, that is, data annotation
  • the system can be deployed on the blockchain.
  • the sending module 260 is configured to send the target data and the review results and/or annotation results corresponding to each target data to the administrator account.
  • the target data and all review results and labeling results corresponding to the target data can be sent to the administrator account, or only the target data and the corresponding review results can be sent to the administrator account; and
  • target data that does not have a corresponding review result but only has a corresponding annotation result the target data and the corresponding annotation result can be sent to the administrator account.
  • the authority of the administrator account further includes uploading sample data corresponding to the labeling task and labeling results corresponding to the sample data
  • the data labeling system further includes:
  • the automatic training module is configured to receive multiple sample data corresponding to the labeling task and labeling results corresponding to each sample data uploaded by the administrator account, so as to be used in the multiple data labeling models of the automatic labeling module
  • the multiple sample data and the labeling results corresponding to each sample data are used to perform the data labeling model in the automatic labeling module with the highest degree of matching with the labeling task.
  • the automatic labeling module includes model information corresponding to the data labeling model
  • the administrator module is used to send the task creation page to the administrator account.
  • the task creation page includes information corresponding to each data labeling model, task description information input box, and It is used for the administrator account to submit the button about the data labeling model that does not match the labeling task.
  • the button is clicked, it is considered that there is no data labeling model matching the labeling task among the multiple data labeling models, and the administrator account passes The task creation page submits task description information.
  • the automatic training module can determine the data labeling model with the highest degree of matching with the labeling task based on each model information and task description information, and then use the sample data and the corresponding labeling results to perform the data labeling model optimization.
  • the labeling task is to label whether there is an animal such as a horse in an image, and there is no model for labeling whether there is a horse in the image in the existing multiple data labeling models, and there is only one for labeling whether there is a sheep in the image. Model.
  • some image sample data that is labeled whether there is a horse can be used to optimize the model for labeling whether there is a sheep in the image, so that the optimized model can handle the labeling task.
  • the optimized data labeling model can be used to process the labeling task. Even if there is no data labeling model that matches the labeling task, by using the automatic training module to quickly complete the model optimization on a small number of samples, the labeling task can be realized The data is automatically labeled.
  • the review result and/or the annotation result in the data annotation system are stored in the blockchain, and the authority of the administrator account further includes uploading a custom data annotation model and using custom
  • the data labeling model processes labeling tasks, and the data labeling system further includes:
  • the custom module is used to obtain the custom data annotation model uploaded by the administrator account, and use the custom data annotation model to process the annotation tasks created by the administrator account according to the instructions of the administrator account.
  • the user can use his own model to complete the annotation task, which can provide the user with a more customized and efficient annotation service.
  • the administrator module is further used for:
  • the tag type includes single-level tags and multi-level tags.
  • the tag type submitted by the administrator account is a multi-level tag
  • the administrator also submits the level information of the label and the range information of each level, where the level information is the number of sub-tags of each level under the label and the relationship between the sub-tags, and the range information of each level is the content of the sub-tags.
  • the data labeling system since the data labeling system supports multi-level labels, it can realize the processing of more fine-grained labeling tasks.
  • Fig. 3 is a schematic diagram showing a classification of multi-level tags and single-level tags according to an exemplary embodiment.
  • the car and the person are multi-level tags, and the license plate is recognized as a single-level tag.
  • the car label includes two sub-labels, which can be level information, the color sub-label belongs to the car label, and the yellow, blue, and purple sub-labels belong to the color sub-label, which can be the range information of each level. .
  • the administrator account is a project administrator account
  • the account of the data labeling system further includes a system administrator account
  • the authority of the system administrator account maintained by the account management module includes account information review Permission
  • the data labeling system further includes:
  • the registration module is used to obtain registration information submitted by a project administrator, where the registration information is used to create a project administrator account for the project administrator;
  • the system administrator module is used to provide the registration information obtained by the registration module to the system administrator account for review, and create a project administrator account corresponding to the registration information when the review is passed.
  • the project administrator is allowed to independently create a project administrator account on the data labeling system.
  • the authority of the project administrator account maintained by the account management module further includes: setting the authority of the target annotator account for processing the annotation task, and the administrator module is also used to obtain The target annotator account for processing the annotation task configured by the project administrator account, and the annotator module is further used for:
  • the target tagger account Provide the target tagger account with the target data and the corresponding tagging result that have been tagged by the automatic tagging module, so that the target tagger account can review the tagging results of the tagged target data To process the labeling task, and receive the review result returned by the target labeler account.
  • the project administrator account since the project administrator account is allowed to freely set which annotator accounts can handle annotation tasks, the project administrator account can be targeted to select specific annotator accounts to process annotation tasks, which improves user experience .
  • the authority of the project administrator account maintained by the account management module further includes: setting a first percentage of the annotator account that processes the annotation task to review the annotation result of the marked target data
  • the administrator module is also used to obtain the first proportion configured by the project administrator account, and the annotator module is further used to:
  • the first proportion of the labeled target data and the corresponding labeling results are randomly selected from the target data and the corresponding labeling results that have been labelled by the automatic labeling module and provided to the labeler account for the purpose of
  • the labeler account processes the labeling task by reviewing randomly selected labeling results corresponding to the marked target data, and receives the review result returned by the labeler account.
  • the project administrator account is given the authority to provide the target data and corresponding annotation results to the annotator account for review.
  • the account of the data labeling system further includes an auditor account
  • the permissions of the auditor account maintained by the account management module include the permission to review the review results of the labeler’s account and the access to the audited annotations.
  • the authority of the review conclusion corresponding to the review result of the employee account, the authority of the project administrator account maintained by the account management module also includes the authority to obtain the review conclusion of the annotator account, where the annotator account is handled by the project
  • An annotator account for an annotation task created by an administrator account the data annotation system further includes:
  • the reviewer module is used to obtain the target data from the annotator module and the review results and/or the annotation results corresponding to each target data, and then combine at least part of the target data and the review results and/or corresponding to each target data. Or the marking result is sent to the auditor account for review, and the review result of at least part of the review result and/or the marking result corresponding to each target data is received from the auditor account;
  • the decision-making module is used to generate an audit conclusion based on the audit result provided by the auditor module;
  • the information presentation module is configured to return the audit conclusion to the auditor account and/or the project administrator account according to a request from the auditor account and/or the project administrator account.
  • the audit result can include, for example, whether the review of the annotator's account is correct, whether the label is correct, and which target data is incorrectly reviewed.
  • the audit conclusion is a summary of the audit result.
  • the automatic generation of audit conclusions can reduce the workload of auditors’ manual input and speed up audit efficiency.
  • the reviewer module by allowing the reviewer module to review the work of the labeler module, it can play a supervisory role and enable the data labeling task to be completed more efficiently.
  • the authority of the project administrator account maintained by the account management module further includes obtaining the scoring result of the annotator account that processes the annotation task created by the project administrator account, and the annotation maintained by the account management module
  • the authority of the tagger account also includes obtaining the scoring result of the tagger account, and the tagger module is further configured to generate completion progress information based on the completion of the tagging task by the tagger account;
  • the decision-making module is further configured to obtain completion progress information of the labeling task by the labeler account, and score the labeler account based on the review result and the completion progress information;
  • the information presentation module is further configured to return to the labeler account and/or the project administrator account the information on the labeler account according to a request from the labeler account and/or the project administrator account. Scoring results.
  • the completion progress information may include, for example, the first percentage of the marked task completed.
  • the second percentage of the number of reviewed correct target data among all the reviewed target data can be calculated based on the first percentage.
  • the first percentage and the second percentage can use certain rules to obtain a score, that is, the score can be used as a scoring result obtained by scoring an annotator's account.
  • the authority of the project administrator account maintained by the account management module further includes: setting the authority of the target auditor account for processing the labeling task, and the administrator module is also used to obtain the project administrator account The configured target auditor account for processing the labeling task, and the auditor module is further used for:
  • At least part of the review results and/or annotation results corresponding to each target data are sent to the target review.
  • the review results of the review results and/or the marked results that are at least partly corresponding to each target data are received from the target reviewer account, and the review results are received from the target reviewer account.
  • the authority of the project administrator account maintained by the account management module further includes: setting the authority of the marking method required to process the marking task, and the administrator module is also used to obtain the project management Annotation method uploaded by an employee account and provide the annotation method to the annotator module, and the annotator module is also used to provide the annotation method to an annotator who processes an annotation task created by the project manager account Account, the labeling task is executed by the labeler account according to the labeling method.
  • the labeling method provided by the project administrator account may be independently provided by the project administrator account, or after multiple labeling methods are pushed to the project administrator account by the administrator module, according to the project administrator account’s Selection and determination are provided by the project administrator account.
  • the data labeling system is pre-configured with multiple labeling methods.
  • the administrator module of the data labeling system pushes a page containing these labeling methods to the project administrator account. After the project administrator account selects a labeling method on the page, the administrator The module regards the labeling method as the obtained labeling method provided by the project administrator account.
  • the labeling method is the method of labeling the account of the labeler. For example, it can be labeling by tick or button, and it can also be whether it is right or wrong or accurate result.
  • the authority of the project administrator account maintained by the account management module further includes the authority of each annotator account to obtain the completion progress information of the annotation task created by the project administrator account, and the administrator module also According to a request from the project administrator account, return to the project administrator account the completion progress information of at least one annotator account for the annotation task created by the project administrator account.
  • the authority of the project administrator account maintained by the account management module further includes: setting the authority of the auditor account to review the second proportion of the review results from the annotator module, the administrator module It is also used to obtain the second proportion configured by the project administrator account, and the auditor module is further used to:
  • the marking result is sent to the auditor account for review, and the second ratio of the review result corresponding to each target data and/or the review result of the marking result is received from the auditor account.
  • the project administrator account is given the authority to allow the auditor module to provide what percentage of the target data and the corresponding review results and/or annotation results to the auditor account for review, which improves the user experience.
  • the authority of the project administrator account maintained by the account management module further includes: the authority to set an optimization mode, which is used to mark the data mark model for processing the marking task created by the project manager account For optimization, the administrator module is also used to obtain the optimization mode configured by the project administrator account.
  • the optimization method includes periodically obtaining the marked data corresponding to the marking task uploaded by the project administrator account through the administrator module, so as to process the marked data created by the project administrator account.
  • the data labeling model of the task is trained.
  • the optimization method includes optimization conditions and optimization means corresponding to the optimization conditions.
  • the optimization condition includes: the number of target data whose corresponding labeling result is inconsistent with the review result reaches a predetermined number threshold, the number of target data whose corresponding labeling result is inconsistent with the review result and the number of all review results
  • the ratio reaches a predetermined ratio threshold
  • the optimization method corresponding to the optimization condition includes: obtaining, through the administrator module, a plurality of pre-labeled data and corresponding labels corresponding to the labeling task uploaded by the project administrator account
  • the data labeling model for processing the labeling task is trained, and the target data that has been labelled by the automatic labeling module and the review result corresponding to the target data are sent to the processor through the labeler module.
  • the data of the labeling task is labeled with the model for training.
  • the optimization condition can reflect the data labeling model's inability to accurately complete the labeling task, which requires the data labeling model conditions, for example, the optimization of "the number of target data whose corresponding labeling results are inconsistent with the review results reaches a predetermined number threshold" Condition, for the same set of target data, it reflects that the labeling result of the target data by the data labeling model is inconsistent with the result of the labeler’s account review of the labeling result, and the accuracy of the manual review result Generally greater than the accuracy of the annotation result of the data annotation model. Therefore, the optimization condition can reflect that the accuracy of the data annotation model in processing the annotation task is not high. At this time, the data annotation model can be optimized through the above optimization method, thereby improving the data The accuracy of the labeling model for processing labeling tasks.
  • connection relationship of the modules in the system architecture diagram of the data labeling system shown in FIG. 2 is exemplary. In actual applications, various connection modes can be designed between the modules, which is not limited in this application. .
  • the efficiency of data labeling is improved and the cost of data labeling is reduced.
  • the labeler module of the data labeling system can also hand over the labeling results of the target data by the automatic labeling module to the labeler’s account for review, thus achieving a balance between data labeling efficiency and data labeling accuracy.
  • an electronic device capable of implementing the above-mentioned data labeling system, wherein the data labeling system includes:
  • the account management module is used to maintain the account of the data labeling system and the corresponding authority of each account.
  • the account of the data labeling system includes an administrator account and an annotator account.
  • the authority of the administrator account includes creating an annotation task, so
  • the authority of the annotator account includes the processing of annotating tasks;
  • the administrator module is used to create a labeling task according to the instructions of the administrator account
  • a data receiving module configured to receive target data corresponding to the labeling task uploaded by the administrator account
  • the automatic labeling module includes multiple data labeling models, and each data labeling model is used to process labeling tasks matching the data labeling model to label the target data corresponding to the labeling task to obtain the labeling result of the target data ;
  • the annotator module is used to provide the target data that has been annotated by the automatic annotation module and the corresponding annotation result to the annotator account, so that the annotator account can pass through the annotation of the marked target data Recheck the labeling result to process the labeling task, and receive the recheck result returned by the labeler account; or
  • the sending module is used to send the target data and the review results and/or annotation results corresponding to each target data to the administrator account.
  • the electronic device 400 according to this embodiment of the present application will be described below with reference to FIG. 4.
  • the electronic device 400 shown in FIG. 4 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the electronic device 400 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 400 may include, but are not limited to: the aforementioned at least one processing unit 410, the aforementioned at least one storage unit 420, and a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410).
  • the storage unit stores program code, and the program code can be executed by the processing unit 410, so that the processing unit 410 executes the various exemplary implementations described in the “embodiment” section of this specification. Way steps.
  • the storage unit 420 may include a computer-readable storage medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 421 and/or a cache storage unit 422, and may further include a read-only storage unit (ROM) 423.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 420 may also include a program/utility tool 424 having a set of (at least one) program module 425.
  • program module 425 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data. Each of these examples or some combination may include the implementation of a network environment.
  • the bus 430 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the electronic device 400 may also communicate with one or more external devices 600 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 400, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 450.
  • the electronic device 400 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 460.
  • networks for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet
  • the network adapter 460 communicates with other modules of the electronic device 400 through the bus 430. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile computer-readable storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) Or on the network, several instructions are included to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the data labeling system according to the embodiment of the present application.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the program product of the system wherein the data labeling system includes: an account management module for maintaining the account of the data labeling system and the corresponding authority of each account, and the account of the data labeling system includes an administrator account and an annotator account , The authority of the administrator account includes creating an annotation task, and the authority of the annotator account includes processing an annotation task; an administrator module for creating an annotation task according to the instructions of the administrator account; a data receiving module for receiving The target data corresponding to the labeling task uploaded by the administrator account; the automatic labeling module includes a plurality of data labeling models, and each data labeling model is used to process labeling tasks matching the data labeling model to The target data corresponding to the labeling task is labeled to obtain the labeling result of the target data; the labeler module is used to provide the target data and the corresponding labeling result that have been labelled by the automatic labeling module
  • various aspects of the present application can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary System" section of this specification.
  • a program product 500 for implementing the above-mentioned data labeling system is described, which is stored on a computer-readable storage medium, and may be a portable compact disk read-only memory (CD-ROM) And include program code, and can run on terminal equipment, such as personal computer.
  • the program product of this application is not limited to this.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of this application can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers.
  • the blockchain referred to in the present invention is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及区块链的智能合约技术领域,揭示了一种数据标注系统、介质及电子设备。该系统包括:账户管理模块,用于维护账户和对应的权限;管理员模块,用于根据管理员账户的指令创建标注任务;数据接收模块,用于接收管理员账户上传的数据;自动标注模块包括标注模型,用于处理标注任务;标注员模块,用于向标注员账户提供由自动标注模块标注的数据及标注结果,使标注员账户对标注结果复核,接收复核结果;或向标注员账户提供未标注数据,使标注员账户对未标注数据进行标注,接收标注结果;发送模块,用于将目标数据及复核结果或标注结果发送至管理员账户,其中,复核结果和/或标注结果可存储于区块链中。本申请实现了数据标注效率和准确率间的兼顾。

Description

数据标注系统、计算机可读存储介质及电子设备
本申请要求于2020年5月28日提交中国专利局、申请号为202010469546.7,发明名称为“数据标注系统、计算机可读存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及区块链的智能合约技术技术领域,特别涉及一种数据标注系统、计算机可读存储介质及电子设备。
背景技术
随着机器学习以及人工智能等新兴技术的发展,数据标注已经成为监督学习领域不可避免并且需要耗费大量人力的工作。并且发明人意识到,为了对大量数据进行标注,需要投入大量的人力物力,从而导致了数据标注的工作量较大,数据标注效率低下,标注成本较高。
技术问题
为了对大量数据进行标注,需要投入大量的人力物力,从而导致了数据标注的工作量较大,数据标注效率低下,标注成本较高。
技术解决方案
在区块链的智能合约技术技术领域,为了解决上述技术问题,本申请的目的在于提供一种数据标注系统、计算机可读存储介质及电子设备。
根据本申请的第一方面,提供了一种数据标注系统,所述数据标注系统包括:账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;管理员模块,用于根据所述管理员账户的指令创建标注任务;数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
根据本申请的第二方面,提供了一种计算机可读存储介质,其存储有计算机程序指令,当所述计算机程序指令被计算机执行时,使计算机实现如前所述的数据标注系统,其中,所述数据标注系统包括:账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;管理员模块,用于根据所述管理员账户的指令创建标注任务;数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
根据本申请的另一方面,提供了一种电子设备,所述电子设备包括:
处理器;
存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现如前所述的数据标注系统,其中,所述数据标注系统包括:账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;管理员模块,用于根据所述管理员账户的指令创建标注任务;数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
有益效果
在本申请提供的一种数据标注系统、计算机可读存储介质及电子设备中,通过构建数据标注系统并利用该数据标注系统中的自动标注模块对目标数据进行标注,因此提高了数据标注效率并降低了数据标注成本;另外,由于数据标注系统的标注员模块还可以将由自动标注模块对目标数据的标注结果交由标注员账户进行复核,因此实现了数据标注效率和数据标注准确率之间的兼顾。
附图说明
图1是根据一示例性实施例示出的一种数据标注系统的应用架构示意图;
图2是根据一示例性实施例示出的一种数据标注系统的系统架构示意图;
图3是根据一示例性实施例示出的一种多级标签和单级标签的分类示意图;
图4是根据一示例性实施例示出的一种实现上述数据标注系统的电子设备示例框图;
图5是根据一示例性实施例示出的一种实现上述数据标注系统的程序产品。
本发明的最佳实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的系统和电子设备的例子。
此外,附图仅为本申请的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。
本申请首先提供了一种数据标注系统。数据标注是指为数据打上标签以为这些数据建立对应的标注信息或者标注结果的过程,一项数据被打上的标签通常为该项数据的特征或者属性,这些特征或者属性能够被用来了解该项数据。比如,可以为一张人脸的照片标注性别的标签,也可以标注年龄的标签,那么易于理解,这些被标注的标签的照片即可被用来训练相应类型的机器学习模型,例如,标注了性别标签的人脸照片可以用来训练用于根据人脸识别性别的机器学习模型,而标注了年龄标签的人脸照片可以用来训练用于根据人脸识别年龄的机器学习模型;因此,数据标注在机器学习和人工智能领域是非常重要的一项工作,而本申请提供的数据标注系统则为数据标注提供了一种高效的工具。
本申请的实施终端可以是任何具有运算、处理以及通信功能的设备,该设备可以与外部设备相连,用于接收或者发送数据,具体可以是便携移动设备,例如智能手机、平板电脑、笔记本电脑、PDA(Personal Digital Assistant)等,也可以是固定式设备,例如,计算机设备、现场终端、台式电脑、服务器、工作站等,还可以是多个设备的集合,比如云计算的物理基础设施或者服务器集群。
可选地,本申请的实施终端可以为服务器或者云计算的物理基础设施。
图1是根据一示例性实施例示出的一种数据标注系统的应用架构示意图。如图1所示,该应用架构包括服务器110、管理员终端120以及标注员终端130,管理员终端120以及标注员终端130均通过通信链路与服务器110相连,通过该通信链路,管理员终端120和标注员终端130可以接收服务器110发来的数据,也可以向服务器110发送数据,服务器110上运行有数据标注系统,因此,在本实施例中服务器110为本申请的实施终端。当本申请提供的一种数据标注系统应用于图1所示的应用架构中时,一个应用方式可以是这样的:标注员终端130上安装有与数据标注系统对应的标注员客户端,而管理员终端120上安装有与数据标注系统对应的管理员客户端,服务器110上的数据标注系统维护着管理员账户和标注员账户以及两种账户对应的权限;首先,管理员通过操作管理员客户端,与数据标注系统的管理员模块进行通信,从而创建标注任务,然后通过操作管理员客户端与数据标注系统的数据接收模块通信,从而上传属于该标注任务的目标数据;接着,数据标注系统的自动标注模块在多个数据标注模型中确定出与该标注任务匹配的数据标注模型,并利用该数据标注模型对目标数据进行标注,生成标注结果;接下来,数据标注系统的标注员模块会与标注员客户端进行通信,从而将已标注的目标数据和对应的标注结果推送给标注员账户,并接收标注员账户返回的复核结果;最后,数据标注系统的发送模块会将目标数据和与目标数据对应的复核结果发送给管理员账户,从而完成数据标注工作。
值得一提的是,图1仅为本申请的一个实施例。虽然在本实施例中的实施终端为服务器,但在其他实施例中,实施终端可以为如前所述的各种终端或设备;虽然在本实施例中,数据标注系统的各个模块位于同一终端上,但在其他实施例中,各模块可以位于不同终端上。虽然在本实施例中,管理员账户和标注员账户分别通过管理员客户端和标注员客户端与数据标注系统进行通信,并且管理员客户端和标注员客户端分别位于本端之外的不同终端上,但在其他实施例或者具体应用中,管理员客户端和/或标注员客户端可以位于包括本端在内的同一终端上,管理员账户和标注员账户也不限于通过客户端的方式与数据标注系统进行通信,本申请对此不作任何限定,本申请的保护范围也不应因此而受到任何限制。
图2是根据一示例性实施例示出的一种数据标注系统的系统架构示意图。本实施例提供的数据标注系统可以由服务器实现并执行,如图2所示,该数据标注系统200包括:
账户管理模块210,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务。
创建标注任务是建立一个可以执行的标注任务的过程,可以包括录入任务信息、启动任务流程等具体步骤。
处理标注任务为进行实际的对数据标注有关的流程,比如可以包括标注数据等。
管理员模块220,用于根据所述管理员账户的指令创建标注任务。
比如,管理员模块具体执行的动作可以是这样的:通过前端向使用管理员账户的用户推送用于创建标注任务的页面,页面上有创建任务的按钮和用于录入任务信息的录入框,使用管理员账户的用户在用于录入任务信息的录入框录入了任务信息后,通过点击创建任务的按钮即可创建标注任务。
数据接收模块230,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据。
目标数据可以是各种可以进行标注并用于训练机器学习模型的数据,比如可以是图像数据、语音数据、文本数据等。
比如,若目标数据是图像数据,那么对应的标注任务可以是标注图像中人脸的性别;若目标数据是语音数据,那么对应的标注任务可以是标注语音所表达的内容。
在一个实施例中,所述管理员模块还用于根据所述管理员账户的指令删除已由所述管理员账户上传的目标数据。
在本实施例中允许管理员账户将已由其上传的目标数据删除,保护了用户隐私。
自动标注模块240,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果。
多个数据标注模型中任意两个数据标注模型之间可以是类似的数据标注模型,也可以是差异很大的数据标注模型。比如,两个数据标注模型可以都是用于标注图像数据的模型,可以分别是用于标注图像数据和语音数据的模型。
在一个实施例中,所述管理员模块进一步用于:
将与所述自动标注模块中各数据标注模型分别对应的信息发送至所述管理员账户,并在获取到由所述管理员账户选择的信息后,将与所述信息对应的数据标注模型作为与所述标注任务匹配的数据标注模型。
在本实施例中,通过允许管理员自主选择用于进行标注任务的数据标注模型,提高了用户体验。
比如,一个数据标注模型对应的信息可以包括数据标注模型的名称和功能描述(例如,名称为性别标注模型,功能描述为对图片中人物的性别进行标注),那么当由管理员账户创建的标注任务需要标注图片数据中的人物的性别时,可以选择该信息,从而将该信息对应的数据标注模型作为与标注任务匹配的数据标注模型。
在一个实施例中,所述管理员模块进一步用于:在所述管理员账户创建标注任务时获取由所述管理员账户上传的标注任务描述信息;
所述数据标注系统还包括:
模型匹配模块,用于根据所述标注任务描述信息在所述自动标注模块的多个数据标注模型中确定出与所述标注任务匹配的数据标注模型。
比如,所述自动标注模块还包括与每一数据标注模型对应的模型描述信息;可以确定所述标注任务描述信息与各模型描述信息的相似度,然后确定出对应的相似度最大的模型描述信息,并将该模型描述信息对应的数据标注模型作为与标注任务匹配的数据标注模型。
在本实施例中,实现了数据标注模型与标注任务的自动匹配。
标注员模块250,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者
向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果。
与目标数据对应的标注结果即为目标数据的标签或者与目标数据对应的标注信息。
已由自动标注模块标注的目标数据及对应的标注结果可以由数据标注系统的标注员模块通过主动向标注员账户推送,从而实现向标注员账户提供;也可以由标注员账户通过主动从数据标注系统的标注员模块拉取,从而实现将目标数据及对应的标注结果提供给标注员账户。
标注结果比如可以是标签,而复核结果比如可以是对标注结果进行了是否正确的判断后,修正得到的新的标注结果。
在一个实施例中,所述标注员模块还用于向所述标注员账户推送任务列表,所述任务列表包括所述标注任务,其中,向所述标注员账户推送所述自动标注模块已标注的所述目标数据及对应的标注结果或者向所述标注员账户推送未标注的所述目标数据是在所述任务列表中的所述标注任务被触发的情况下进行的。
比如,标注员模块可以向标注员账户推送一个页面,页面中包含的任务列表为至少一个任务对应的按钮,该按钮可触发,当该按钮被标注员账户触发时,响应于该触发,从而向标注员账户发起推送。
向标注员账户提供未标注的所述目标数据,以便标注员账户通过对未标注的目标数据进行标注来处理标注任务,在这种处理标注任务的方式下,标注任务完全由人来进行,可以确保标注保持较高的准确率;向标注员账户提供已由自动标注模块标注的目标数据及对应的标注结果,以便标注员账户通过对已标注的目标数据的标注结果进行复核来处理标注任务,这种方式下,用户仅需要对目标数据的标注结果进行复核即可处理标注任务,对目标数据的标注任务绝大部分由模型来自动完成,整个标注任务的处理由人机协同完成,从而提高了标注效率。
需要强调的是,为进一步保证上述复核结果和/或标注结果的私密和安全性,上述数据标注系统中的复核结果和/或标注结果还可以存储于一区块链的节点中,即数据标注系统可以部署在区块链上。
发送模块260,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
对于有对应的复核结果的目标数据,可以将目标数据以及与目标数据对应的所有复核结果和标注结果发送至管理员账户,也可以仅将目标数据和对应的复核结果发送至管理员账户;而对于没有对应的复核结果而仅有对应的标注结果的目标数据,可以将目标数据以及对应的标注结果发送至管理员账户。
在一个实施例中,所述管理员账户的权限还包括上传与标注任务对应的样本数据及与样本数据对应的标注结果,所述数据标注系统还包括:
自动训练模块,用于接收由所述管理员账户上传的与所述标注任务对应的多个样本数据以及与各样本数据对应的标注结果,以便在所述自动标注模块的多个数据标注模型中不存在与所述标注任务匹配的数据标注模型时,利用所述多个样本数据以及与各样本数据对应的标注结果对所述自动标注模块中与所述标注任务匹配程度最高的数据标注模型进行优化,并将优化后的所述与所述标注任务匹配程度最高的数据标注模型作为与所述标注任务匹配的数据标注模型。
比如,自动标注模块包括与数据标注模型对应的模型信息,管理员模块用于将任务创建页面发送至管理员账户,该任务创建页面包括各数据标注模型分别对应的信息、任务描述信息录入框以及用于管理员账户提交关于不存在与标注任务匹配的数据标注模型的按钮,当该按钮被点击时,即认为多个数据标注模型中不存在与标注任务匹配的数据标注模型,管理员账户通过该任务创建页面提交任务描述信息,自动训练模块可以基于各模型信息和任务描述信息确定与标注任务匹配程度最高的数据标注模型,然后利用样本数据以及对应的标注结果即可对该数据标注模型进行优化。
例如,标注任务为标注一张图像中是否存在马这种动物,而已有的多个数据标注模型中不存在用于标注图像中是否存在马的模型,仅存在用于标注图像中是否存在羊的模型,此时用一些被标注了是否存在马的图像样本数据即可对该用于标注图像中是否存在羊的模型进行优化,使得优化后的模型能处理标注任务。
在本实施例中优化后的数据标注模型可以用于处理该标注任务,即使没有与标注任务匹配的数据标注模型,通过利用自动训练模块快速在少量样本上完成模型优化,可以实现对该 标注任务的数据自动标注。
在一个实施例中,所述数据标注系统中的所述复核结果和/或所述标注结果存储于区块链中,所述管理员账户的权限还包括上传自定义数据标注模型和利用自定义数据标注模型处理标注任务,所述数据标注系统还包括:
自定义模块,用于获取管理员账户上传的自定义数据标注模型,并根据所述管理员账户的指令利用所述自定义数据标注模型处理由所述管理员账户创建的标注任务。
在本实施例中,通过允许管理员账户自主上传数据标注模型,使用户可以利用自己的模型完成标注任务,可以为用户提供更加定制化和高效的标注服务。
在一个实施例中,所述管理员模块进一步用于:
获取所述管理员账户创建标注任务时提交的标签类型,所述标签类型包括单级标签及多级标签,其中,当所述管理员账户提交的标签类型为多级标签时,所述管理员账户还提交标签的层级信息和各层级的范围信息,其中所述层级信息为标签下的各层级的子标签数量以及各子标签间的关系,各层级的范围信息为子标签的内容。
在本实施例中,由于数据标注系统支持多级标签,因此可以实现对更细粒度的标注任务的处理。
图3是根据一示例性实施例示出的一种多级标签和单级标签的分类示意图。参见图3所示,车和人为多级标签,车牌识别为单级标签。车这一标签包括两个子标签,这些可以是层级信息,颜色这一子标签属于车这一标签,黄色、蓝色、紫色的子标签属于颜色这一子标签,这些可以是各层级的范围信息。
在一个实施例中,所述管理员账户为项目管理员账户,所述数据标注系统的账户还包括系统管理员账户,所述账户管理模块维护的所述系统管理员账户的权限包括账户信息审核权限,所述数据标注系统还包括:
注册模块,用于获取项目管理员提交的注册信息,所述注册信息用于为所述项目管理员创建项目管理员账户;
系统管理员模块,用于将由所述注册模块获取的注册信息提供给所述系统管理员账户进行审核,并在审核通过时创建与所述注册信息对应的项目管理员账户。
本实施例中允许项目管理员在数据标注系统上自主创建项目管理员账户。
在一个实施例中,所述账户管理模块维护的所述项目管理员账户的权限还包括:设置处理所述标注任务的目标标注员账户的权限,所述管理员模块还用于获取由所述项目管理员账户配置的处理所述标注任务的目标标注员账户,所述标注员模块进一步用于:
向所述目标标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述目标标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述目标标注员账户返回的复核结果。
在本实施例中,由于允许项目管理员账户自由设置哪些标注员账户可以处理标注任务,因此可以使项目管理员账户有针对性地选择特定的标注员账户进行标注任务的处理,提高了用户体验。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括:设置处理所述标注任务的标注员账户对已标注的所述目标数据的标注结果进行复核的第一比例的权限,所述管理员模块还用于获取由项目管理员账户配置的第一比例,所述标注员模块进一步用于:
在已由所述自动标注模块标注的所述目标数据及对应的标注结果中随机选取所述第一比例的已标注的所述目标数据及对应的标注结果提供给所述标注员账户,以便所述标注员账户通过对随机选取的与所述已标注的所述目标数据对应的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果。
在本实施例中,赋予了项目管理员账户能将多大比例的目标数据及对应的标注结果提供 给标注员账户进行复核的权限。
在一个实施例中,所述数据标注系统的账户还包括审核员账户,所述账户管理模块维护的审核员账户的权限包括对标注员账户的复核结果进行审核的权限以及获取与所审核的标注员账户的复核结果对应的审核结论的权限,所述账户管理模块维护的项目管理员账户的权限还包括获取对标注员账户的审核结论的权限,其中,所述标注员账户为处理由该项目管理员账户创建的标注任务的标注员账户,所述数据标注系统还包括:
审核员模块,用于在获取到来自所述标注员模块的目标数据及与各目标数据对应的复核结果和/或标注结果后,将至少部分目标数据及与各目标数据对应的复核结果和/或标注结果发送至所述审核员账户进行审核,并接收来自所述审核员账户的对至少部分与各目标数据对应的所述复核结果和/或所述标注结果的审核结果;
决策模块,用于基于由所述审核员模块提供的所述审核结果生成审核结论;
信息呈现模块,用于根据来自所述审核员账户和/或所述项目管理员账户的请求,向所述审核员账户和/或所述项目管理员账户返回所述审核结论。
审核结果比如可以包括标注员账户的复核是否正确、标注是否正确、哪些目标数据复核错误等信息,审核结论是对审核结果的总结性信息。审核结论自动生成可以减少审核人员手动输入的工作量,加速审核效率。
在本实施例中,通过允许利用审核员模块对标注员模块的工作进行审核,可以起到监督作用,使数据标注任务能够更加高效地完成。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括获取对处理由该项目管理员账户创建的标注任务的标注员账户的打分结果,所述账户管理模块维护的标注员账户的权限还包括获取对该标注员账户的打分结果,所述标注员模块还用于基于所述标注员账户对所述标注任务的完成情况生成完成进度信息;
所述决策模块还用于获取所述标注员账户对所述标注任务的完成进度信息,并基于所述审核结果和所述完成进度信息为所述标注员账户进行打分;
所述信息呈现模块还用于根据来自所述标注员账户和/或所述项目管理员账户的请求,向所述标注员账户和/或所述项目管理员账户返回对所述标注员账户的打分结果。
完成进度信息比如可以包括标注任务已完成的第一百分比,比如基于审核结果可以统计得到复核正确的目标数据的数量在所有已复核的目标数据的数量中的第二百分比,基于第一百分比和第二百分比可以利用一定规则得到一个分数,即可以将该分数作为为标注员账户进行打分而得到的打分结果。
通过向标注员账户反馈打分结果,可以向标注员账户提供正向的反馈和激励,从而可以提高标注员账户的对标注任务的处理效果;而通过向项目管理员账户反馈打分结果,可以使项目管理员账户获知标注员账户对标注任务的处理情况。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括:设置处理所述标注任务的目标审核员账户的权限,所述管理员模块还用于获取由项目管理员账户配置的处理所述标注任务的目标审核员账户,所述审核员模块进一步用于:
在获取到来自所述标注员模块的目标数据及与各目标数据对应的复核结果和/或标注结果后,将至少部分与各目标数据对应的复核结果和/或标注结果发送至所述目标审核员账户进行审核,并接收来自所述目标审核员账户的对至少部分与各目标数据对应的所述复核结果和/或所述标注结果的审核结果。
在本实施例中,通过允许项目管理员账户自由设置哪些审核员账户可以处理标注任务,提高了用户体验。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括:设置处理所述标注任务所需使用的标注方法的权限,所述管理员模块还用于获取所述项目管理员账户上 传的标注方法并将所述标注方法提供给所述标注员模块,所述标注员模块还用于将所述标注方法提供给处理由所述项目管理员账户创建的标注任务的标注员账户,由所述标注员账户按照所述标注方法执行所述标注任务。
所述项目管理员账户提供的标注方法可以是由所述项目管理员账户自主提供的,也可以是由管理员模块向所述项目管理员账户推送多个标注方法后,根据项目管理员账户的选择而确定由所述项目管理员账户提供的。
比如,数据标注系统内预先配置有多个标注方法,数据标注系统的管理员模块向项目管理员账户推送包含这些标注方法的页面,项目管理员账户在该页面上选择一个标注方法后,管理员模块即将该标注方法作为获取得到的由项目管理员账户提供的标注方法。
标注方法是标注员账户进行标注的方式,比如可以是通过打钩进行标注还是通过按钮进行标注,还可以是标注对错还是标注准确结果等。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括获取各标注员账户对由该项目管理员账户创建的标注任务的完成进度信息的权限,所述管理员模块还用于根据来自所述项目管理员账户的请求,向所述项目管理员账户返回至少一个标注员账户对由所述项目管理员账户创建的标注任务的完成进度信息。
可以通过编写代码统计标注员账户对标注任务的完成情况,从而得到完成进度信息。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括:设置审核员账户对来自所述标注员模块的复核结果进行审核的第二比例的权限,所述管理员模块还用于获取由项目管理员账户配置的第二比例,所述审核员模块进一步用于:
在获取到来自所述标注员模块的目标数据及与各目标数据对应的复核结果和/或标注结果后,随机选取所述第二比例的目标数据及与各目标数据对应的复核结果和/或标注结果发送至所述审核员账户进行审核,并接收来自所述审核员账户的对所述第二比例的与各目标数据对应的所述复核结果和/或所述标注结果的审核结果。
在本实施例中,赋予了项目管理员账户能允许审核员模块将多大比例的目标数据及对应的复核结果和/或标注结果提供给审核员账户进行审核的权限,提高了用户体验。
在一个实施例中,所述账户管理模块维护的项目管理员账户的权限还包括:设置优化方式的权限,所述优化方式用于对处理由该项目管理员账户创建的标注任务的数据标注模型进行优化,所述管理员模块还用于获取由项目管理员账户配置的优化方式。
在一个实施例中,所述优化方式包括通过所述管理员模块定期获取由项目管理员账户上传的与所述标注任务对应的已标注的数据,以对处理由该项目管理员账户创建的标注任务的数据标注模型进行训练。
在一个实施例中,所述优化方式包括优化条件以及与优化条件对应的优化手段。
在一个实施例中,所述优化条件包括:对应的标注结果与复核结果不一致的目标数据的数目达到预定数目阈值、对应的标注结果与复核结果不一致的目标数据的数目与所有复核结果的数目的比值达到预定比值阈值,所述与优化条件对应的优化手段包括:通过所述管理员模块获取由所述项目管理员账户上传的与所述标注任务对应的多个预先标注的数据和对应的标注结果,以对处理所述标注任务的数据标注模型进行训练、通过所述标注员模块将所述自动标注模块已标注的所述目标数据以及与所述目标数据对应的复核结果发送至处理所述标注任务的数据标注模型,以进行训练。
优化条件可以是反映了数据标注模型不能准确地完成标注任务,从而需要对数据标注模型的条件,比如,对于“对应的标注结果与复核结果不一致的目标数据的数目达到预定数目阈值”这一优化条件,对于同样一组目标数据来说,它反映了数据标注模型对该目标数据的标注结果与标注员账户对该标注结果进行复核的符合结果不一致的情况比较多,而人工复核结果的精确度一般大于数据标注模型的标注结果的精确度,因此,该优化条件可以反映出数 据标注模型处理标注任务的准确性不高,此时通过上述优化方式可以该数据标注模型进行优化,从而可以提高数据标注模型处理标注任务的准确性。
需要指出的是,图2示出的数据标注系统的系统架构示意图中各模块的连接关系是示例性的,实际应用中可以将各模块之间设计为各种连接方式,本申请对此不作限制。
综上所述,根据图2实施例提供的数据标注系统,通过构建数据标注系统并利用该数据标注系统中的自动标注模块对目标数据进行标注,因此提高了数据标注效率并降低了数据标注成本;另外,由于数据标注系统的标注员模块还可以将由自动标注模块对目标数据的标注结果交由标注员账户进行复核,因此实现了数据标注效率和数据标注准确率之间的兼顾。
根据本申请的第二方面,还提供了一种能够实现上述数据标注系统的电子设备,其中,所述数据标注系统包括:
账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;
管理员模块,用于根据所述管理员账户的指令创建标注任务;
数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;
自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;
标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者
向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;
发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图4来描述根据本申请的这种实施方式的电子设备400。图4显示的电子设备400仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400以通用计算设备的形式表现。电子设备400的组件可以包括但不限于:上述至少一个处理单元410、上述至少一个存储单元420、连接不同系统组件(包括存储单元420和处理单元410)的总线430。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元410执行,使得所述处理单元410执行本说明书上述“实施例”部分中描述的根据本申请各种示例性实施方式的步骤。
存储单元420可以包括易失性存储单元形式的计算机可读存储介质,例如随机存取存储单元(RAM)421和/或高速缓存存储单元422,还可以进一步包括只读存储单元(ROM)423。
存储单元420还可以包括具有一组(至少一个)程序模块425的程序/实用工具424,这样的程序模块425包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线430可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域 总线。
电子设备400也可以与一个或多个外部设备600(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备400交互的设备通信,和/或与使得该电子设备400能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口450进行。并且,电子设备400还可以通过网络适配器460与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器460通过总线430与电子设备400的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备400使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性计算机可读存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本申请实施方式的数据标注系统。
根据本申请的第三方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有能够实现本说明书上述数据标注系统的程序产品,其中,所述数据标注系统包括:账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;管理员模块,用于根据所述管理员账户的指令创建标注任务;数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性系统”部分中描述的根据本申请各种示例性实施方式的步骤。
参考图5所示,描述了根据本申请的实施方式的用于实现上述数据标注系统的程序产品500,其存储在计算机可读存储介质上,可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储 器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本申请示例性实施例的数据标注系统所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种数据标注系统,其中,所述数据标注系统包括:
    账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;
    管理员模块,用于根据所述管理员账户的指令创建标注任务;
    数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;
    自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;
    标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者
    向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;
    发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
  2. 根据权利要求1所述的数据标注系统,其中,所述管理员账户的权限还包括上传与标注任务对应的样本数据及与样本数据对应的标注结果,所述数据标注系统还包括:
    自动训练模块,用于接收由所述管理员账户上传的与所述标注任务对应的多个样本数据以及与各样本数据对应的标注结果,以便在所述自动标注模块的多个数据标注模型中不存在与所述标注任务匹配的数据标注模型时,利用所述多个样本数据以及与各样本数据对应的标注结果对所述自动标注模块中与所述标注任务匹配程度最高的数据标注模型进行优化,并将优化后的所述与所述标注任务匹配程度最高的数据标注模型作为与所述标注任务匹配的数据标注模型。
  3. 根据权利要求1所述的数据标注系统,其中,所述数据标注系统中的所述复核结果和/或所述标注结果存储于区块链中,所述管理员账户的权限还包括上传自定义数据标注模型和利用自定义数据标注模型处理标注任务,所述数据标注系统还包括:
    自定义模块,用于获取管理员账户上传的自定义数据标注模型,并根据所述管理员账户的指令利用所述自定义数据标注模型处理由所述管理员账户创建的标注任务。
  4. 根据权利要求1所述的数据标注系统,其中,所述管理员模块进一步用于:
    获取所述管理员账户创建标注任务时提交的标签类型,所述标签类型包括单级标签及多级标签,其中,当所述管理员账户提交的标签类型为多级标签时,所述管理员账户还提交标签的层级信息和各层级的范围信息,其中所述层级信息为标签下的各层级的子标签数量以及各子标签间的关系,各层级的范围信息为子标签的内容。
  5. 根据权利要求1-4任意一项所述的数据标注系统,其中,所述管理员账户为项目管理员账户,所述数据标注系统的账户还包括系统管理员账户,所述账户管理模块维护的所述系统管理员账户的权限包括账户信息审核权限,所述数据标注系统还包括:
    注册模块,用于获取项目管理员提交的注册信息,所述注册信息用于为所述项目管理员创建项目管理员账户;
    系统管理员模块,用于将由所述注册模块获取的注册信息提供给所述系统管理员账户进行审核,并在审核通过时创建与所述注册信息对应的项目管理员账户。
  6. 根据权利要求5所述的数据标注系统,其中,所述账户管理模块维护的所述项目管理员账户的权限还包括:设置处理所述标注任务的目标标注员账户的权限,所述管理员模块 还用于获取由所述项目管理员账户配置的处理所述标注任务的目标标注员账户,所述标注员模块进一步用于:
    向所述目标标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述目标标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述目标标注员账户返回的复核结果。
  7. 根据权利要求5所述的数据标注系统,其中,所述账户管理模块维护的项目管理员账户的权限还包括:设置处理所述标注任务的标注员账户对已标注的所述目标数据的标注结果进行复核的第一比例的权限,所述管理员模块还用于获取由项目管理员账户配置的第一比例,所述标注员模块进一步用于:
    在已由所述自动标注模块标注的所述目标数据及对应的标注结果中随机选取所述第一比例的已标注的所述目标数据及对应的标注结果提供给所述标注员账户,以便所述标注员账户通过对随机选取的与所述已标注的所述目标数据对应的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果。
  8. 根据权利要求5所述的数据标注系统,其中,所述数据标注系统的账户还包括审核员账户,所述账户管理模块维护的审核员账户的权限包括对标注员账户的复核结果进行审核的权限以及获取与所审核的标注员账户的复核结果对应的审核结论的权限,所述账户管理模块维护的项目管理员账户的权限还包括获取对标注员账户的审核结论的权限,其中,所述标注员账户为处理由该项目管理员账户创建的标注任务的标注员账户,所述数据标注系统还包括:
    审核员模块,用于在获取到来自所述标注员模块的目标数据及与各目标数据对应的复核结果和/或标注结果后,将至少部分目标数据及与各目标数据对应的复核结果和/或标注结果发送至所述审核员账户进行审核,并接收来自所述审核员账户的对至少部分与各目标数据对应的所述复核结果和/或所述标注结果的审核结果;
    决策模块,用于基于由所述审核员模块提供的所述审核结果生成审核结论;
    信息呈现模块,用于根据来自所述审核员账户和/或所述项目管理员账户的请求,向所述审核员账户和/或所述项目管理员账户返回所述审核结论。
  9. 一种计算机可读存储介质,其中,其存储有计算机程序指令,当所述计算机程序指令被计算机执行时,使计算机实现一种数据标注系统;
    其中,所述数据标注系统包括:
    账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;
    管理员模块,用于根据所述管理员账户的指令创建标注任务;
    数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;
    自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结果;
    标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者
    向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;
    发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
  10. 根据权利要求9所述的计算机可读存储介质,其中,所述管理员账户的权限还包括上传与标注任务对应的样本数据及与样本数据对应的标注结果,所述数据标注系统还包括:
    自动训练模块,用于接收由所述管理员账户上传的与所述标注任务对应的多个样本数据以及与各样本数据对应的标注结果,以便在所述自动标注模块的多个数据标注模型中不存在与所述标注任务匹配的数据标注模型时,利用所述多个样本数据以及与各样本数据对应的标注结果对所述自动标注模块中与所述标注任务匹配程度最高的数据标注模型进行优化,并将优化后的所述与所述标注任务匹配程度最高的数据标注模型作为与所述标注任务匹配的数据标注模型。
  11. 根据权利要求9所述的计算机可读存储介质,其中,所述数据标注系统中的所述复核结果和/或所述标注结果存储于区块链中,所述管理员账户的权限还包括上传自定义数据标注模型和利用自定义数据标注模型处理标注任务,所述数据标注系统还包括:
    自定义模块,用于获取管理员账户上传的自定义数据标注模型,并根据所述管理员账户的指令利用所述自定义数据标注模型处理由所述管理员账户创建的标注任务。
  12. 根据权利要求9所述的计算机可读存储介质,其中,所述管理员模块进一步用于:
    获取所述管理员账户创建标注任务时提交的标签类型,所述标签类型包括单级标签及多级标签,其中,当所述管理员账户提交的标签类型为多级标签时,所述管理员账户还提交标签的层级信息和各层级的范围信息,其中所述层级信息为标签下的各层级的子标签数量以及各子标签间的关系,各层级的范围信息为子标签的内容。
  13. 根据权利要求9-12任意一项所述的计算机可读存储介质,其中,所述管理员账户为项目管理员账户,所述数据标注系统的账户还包括系统管理员账户,所述账户管理模块维护的所述系统管理员账户的权限包括账户信息审核权限,所述数据标注系统还包括:
    注册模块,用于获取项目管理员提交的注册信息,所述注册信息用于为所述项目管理员创建项目管理员账户;
    系统管理员模块,用于将由所述注册模块获取的注册信息提供给所述系统管理员账户进行审核,并在审核通过时创建与所述注册信息对应的项目管理员账户。
  14. 根据权利要求13任意一项所述的计算机可读存储介质,其中,所述账户管理模块维护的所述项目管理员账户的权限还包括:设置处理所述标注任务的目标标注员账户的权限,所述管理员模块还用于获取由所述项目管理员账户配置的处理所述标注任务的目标标注员账户,所述标注员模块进一步用于:
    向所述目标标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述目标标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述目标标注员账户返回的复核结果。
  15. 一种电子设备,其中,所述电子设备包括:
    处理器;
    存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,实现一种数据标注系统;
    其中,所述数据标注系统包括:
    账户管理模块,用于维护所述数据标注系统的账户和各账户对应的权限,所述数据标注系统的账户包括管理员账户和标注员账户,所述管理员账户的权限包括创建标注任务,所述标注员账户的权限包括处理标注任务;
    管理员模块,用于根据所述管理员账户的指令创建标注任务;
    数据接收模块,用于接收由所述管理员账户上传的与所述标注任务对应的目标数据;
    自动标注模块,包括多个数据标注模型,每一数据标注模型用于处理与该数据标注模型匹配的标注任务,以对该标注任务对应的目标数据进行标注,得到对所述目标数据的标注结 果;
    标注员模块,用于向所述标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述标注员账户返回的复核结果;或者
    向所述标注员账户提供未标注的所述目标数据,以便所述标注员账户通过对所述未标注的所述目标数据进行标注来处理所述标注任务,并接收所述标注员账户返回的标注结果;
    发送模块,用于将所述目标数据以及与各目标数据对应的复核结果和/或标注结果发送至所述管理员账户。
  16. 根据权利要求15所述的电子设备,其中,所述管理员账户的权限还包括上传与标注任务对应的样本数据及与样本数据对应的标注结果,所述数据标注系统还包括:
    自动训练模块,用于接收由所述管理员账户上传的与所述标注任务对应的多个样本数据以及与各样本数据对应的标注结果,以便在所述自动标注模块的多个数据标注模型中不存在与所述标注任务匹配的数据标注模型时,利用所述多个样本数据以及与各样本数据对应的标注结果对所述自动标注模块中与所述标注任务匹配程度最高的数据标注模型进行优化,并将优化后的所述与所述标注任务匹配程度最高的数据标注模型作为与所述标注任务匹配的数据标注模型。
  17. 根据权利要求15所述的电子设备,其中,所述数据标注系统中的所述复核结果和/或所述标注结果存储于区块链中,所述管理员账户的权限还包括上传自定义数据标注模型和利用自定义数据标注模型处理标注任务,所述数据标注系统还包括:
    自定义模块,用于获取管理员账户上传的自定义数据标注模型,并根据所述管理员账户的指令利用所述自定义数据标注模型处理由所述管理员账户创建的标注任务。
  18. 根据权利要求15所述的电子设备,其中,所述管理员模块进一步用于:
    获取所述管理员账户创建标注任务时提交的标签类型,所述标签类型包括单级标签及多级标签,其中,当所述管理员账户提交的标签类型为多级标签时,所述管理员账户还提交标签的层级信息和各层级的范围信息,其中所述层级信息为标签下的各层级的子标签数量以及各子标签间的关系,各层级的范围信息为子标签的内容。
  19. 根据权利要求15-18任意一项所述的电子设备,其中,所述管理员账户为项目管理员账户,所述数据标注系统的账户还包括系统管理员账户,所述账户管理模块维护的所述系统管理员账户的权限包括账户信息审核权限,所述数据标注系统还包括:
    注册模块,用于获取项目管理员提交的注册信息,所述注册信息用于为所述项目管理员创建项目管理员账户;
    系统管理员模块,用于将由所述注册模块获取的注册信息提供给所述系统管理员账户进行审核,并在审核通过时创建与所述注册信息对应的项目管理员账户。
  20. 根据权利要求19任意一项所述的电子设备,其中,所述账户管理模块维护的所述项目管理员账户的权限还包括:设置处理所述标注任务的目标标注员账户的权限,所述管理员模块还用于获取由所述项目管理员账户配置的处理所述标注任务的目标标注员账户,所述标注员模块进一步用于:
    向所述目标标注员账户提供已由所述自动标注模块标注的所述目标数据及对应的标注结果,以便所述目标标注员账户通过对所述已标注的所述目标数据的标注结果进行复核来处理所述标注任务,并接收所述目标标注员账户返回的复核结果。
PCT/CN2020/124738 2020-05-28 2020-10-29 数据标注系统、计算机可读存储介质及电子设备 WO2021139346A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010469546.7 2020-05-28
CN202010469546.7A CN111695613B (zh) 2020-05-28 2020-05-28 数据标注系统、计算机可读存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2021139346A1 true WO2021139346A1 (zh) 2021-07-15

Family

ID=72478512

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124738 WO2021139346A1 (zh) 2020-05-28 2020-10-29 数据标注系统、计算机可读存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN111695613B (zh)
WO (1) WO2021139346A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860979A (zh) * 2023-09-04 2023-10-10 上海柯林布瑞信息技术有限公司 基于标签知识库的医疗文本标注方法及装置

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695613B (zh) * 2020-05-28 2023-01-24 平安科技(深圳)有限公司 数据标注系统、计算机可读存储介质及电子设备
CN112287911B (zh) * 2020-12-25 2021-05-28 长沙海信智能系统研究院有限公司 数据标注方法、装置、设备及存储介质
CN113255879B (zh) * 2021-01-13 2024-05-24 深延科技(北京)有限公司 一种深度学习标注方法、系统、计算机设备和存储介质
CN113034025B (zh) * 2021-04-08 2023-12-01 成都国星宇航科技股份有限公司 一种遥感图像标注系统和方法
CN113486204A (zh) * 2021-06-25 2021-10-08 平安科技(深圳)有限公司 一种图片标注方法、装置、介质及设备
CN113254221A (zh) * 2021-07-09 2021-08-13 武汉精创电子技术有限公司 用于缺陷标注的任务执行系统和方法
CN113592270A (zh) * 2021-07-22 2021-11-02 上海淇玥信息技术有限公司 一种业务语音标注处理方法、装置和电子设备
CN113435447B (zh) * 2021-07-26 2023-08-04 杭州海康威视数字技术股份有限公司 图像标注方法、装置及图像标注系统
CN113744848A (zh) * 2021-08-02 2021-12-03 中山大学中山眼科中心 一种医疗图像标注管理的实现方法及系统
CN113407980B (zh) * 2021-08-18 2022-02-15 深圳市信润富联数字科技有限公司 数据标注系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140297689A1 (en) * 2010-12-09 2014-10-02 International Business Machines Corporation Hierarchical multi-tenancy management of system resources in resource groups
CN108881446A (zh) * 2018-06-22 2018-11-23 深源恒际科技有限公司 一种基于深度学习的人工智能平台系统
CN108985293A (zh) * 2018-06-22 2018-12-11 深源恒际科技有限公司 一种基于深度学习的图像自动化标注方法及系统
CN109492997A (zh) * 2018-10-31 2019-03-19 四川长虹电器股份有限公司 一种基于SpringBoot的图像标注平台系统
CN110096480A (zh) * 2019-03-28 2019-08-06 厦门快商通信息咨询有限公司 一种文本标注系统、方法及存储介质
CN111178845A (zh) * 2019-12-31 2020-05-19 清华大学苏州汽车研究院(吴江) 一种基于网络服务平台的数据标注系统及方法
CN111695613A (zh) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 数据标注系统、计算机可读存储介质及电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674295A (zh) * 2019-09-11 2020-01-10 成都数之联科技有限公司 一种基于深度学习的数据标注系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140297689A1 (en) * 2010-12-09 2014-10-02 International Business Machines Corporation Hierarchical multi-tenancy management of system resources in resource groups
CN108881446A (zh) * 2018-06-22 2018-11-23 深源恒际科技有限公司 一种基于深度学习的人工智能平台系统
CN108985293A (zh) * 2018-06-22 2018-12-11 深源恒际科技有限公司 一种基于深度学习的图像自动化标注方法及系统
CN109492997A (zh) * 2018-10-31 2019-03-19 四川长虹电器股份有限公司 一种基于SpringBoot的图像标注平台系统
CN110096480A (zh) * 2019-03-28 2019-08-06 厦门快商通信息咨询有限公司 一种文本标注系统、方法及存储介质
CN111178845A (zh) * 2019-12-31 2020-05-19 清华大学苏州汽车研究院(吴江) 一种基于网络服务平台的数据标注系统及方法
CN111695613A (zh) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 数据标注系统、计算机可读存储介质及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116860979A (zh) * 2023-09-04 2023-10-10 上海柯林布瑞信息技术有限公司 基于标签知识库的医疗文本标注方法及装置
CN116860979B (zh) * 2023-09-04 2023-12-08 上海柯林布瑞信息技术有限公司 基于标签知识库的医疗文本标注方法及装置

Also Published As

Publication number Publication date
CN111695613A (zh) 2020-09-22
CN111695613B (zh) 2023-01-24

Similar Documents

Publication Publication Date Title
WO2021139346A1 (zh) 数据标注系统、计算机可读存储介质及电子设备
US11775494B2 (en) Multi-service business platform system having entity resolution systems and methods
US20210342745A1 (en) Artificial intelligence model and data collection/development platform
US11842145B1 (en) Systems, devices, and methods for software coding
US11604980B2 (en) Targeted crowd sourcing for metadata management across data sets
US20180143975A1 (en) Collection strategies that facilitate arranging portions of documents into content collections
WO2021057318A1 (zh) 一种业务进度监控方法、装置、系统及计算机可读存储介质
US10511653B2 (en) Discussion-based document collaboration
US11531928B2 (en) Machine learning for associating skills with content
CN102880683B (zh) 一种可行性研究报告的自动网络生成系统及其生成方法
CN112199084B (zh) 基于Django的文本标注平台
TWI815140B (zh) 流程操作系統及流程操作方法
TW202213145A (zh) 文件機密等級管理系統及方法
WO2023040145A1 (zh) 基于人工智能的文本分类方法、装置、电子设备及介质
US11567948B2 (en) Autonomous suggestion of related issues in an issue tracking system
US20230418793A1 (en) Multi-service business platform system having entity resolution systems and methods
US11714813B2 (en) System and method for proposing annotations
US20220327124A1 (en) Machine learning for locating information in knowledge graphs
CN116868212A (zh) 术语定义的定制转换和质量评估
CN114676694A (zh) 业务模型的生成方法、装置、设备、介质和程序产品
CN111914136A (zh) 一种资源管理方法、装置、电子设备及存储介质
US20220358398A1 (en) Machine-learned models incorporating sequence encoders that operate on bag of words input
CN113255879B (zh) 一种深度学习标注方法、系统、计算机设备和存储介质
TW201931817A (zh) 網路使用者身份辨識方法與系統
CN117236659B (zh) 一种基于线上旅游平台的团计划管理方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912611

Country of ref document: EP

Kind code of ref document: A1