CN108984490A - A kind of data mask method, device, electronic equipment and storage medium - Google Patents

A kind of data mask method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN108984490A
CN108984490A CN201810786768.4A CN201810786768A CN108984490A CN 108984490 A CN108984490 A CN 108984490A CN 201810786768 A CN201810786768 A CN 201810786768A CN 108984490 A CN108984490 A CN 108984490A
Authority
CN
China
Prior art keywords
mark
subtask
data
task
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810786768.4A
Other languages
Chinese (zh)
Inventor
武冰冰
田淑凤
岳学林
谢泽华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Orion Star Technology Co Ltd
Original Assignee
Beijing Orion Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Orion Star Technology Co Ltd filed Critical Beijing Orion Star Technology Co Ltd
Priority to CN201810786768.4A priority Critical patent/CN108984490A/en
Publication of CN108984490A publication Critical patent/CN108984490A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes

Abstract

The present invention relates to Internet technical field more particularly to a kind of data mask method, device, electronic equipment and storage mediums, obtain the annotation results for at least two mark subtasks that data mark task includes;Wherein, at least two mark subtask is carried out in corresponding task execution terminal respectively;Annotation results based at least two mark subtask, determine the data annotation results of the data mark task, in this way, it can be according to the annotation results of multiple mark subtasks, the final data annotation results for determining data mark task, each subtask that marks can be executed in each suitable task execution terminal, so that annotating efficiency not only can be improved, mark accuracy can also be improved, cost is reduced.

Description

A kind of data mask method, device, electronic equipment and storage medium
Technical field
The present invention relates to Internet technical field more particularly to a kind of data mask method, device, electronic equipment and storages Medium.
Background technique
Currently, the realization of the function or service of usually each system platform requires the support of mass data, such as need Obtain a large amount of number of training and carry out model training accordingly, how effectively to obtain these training sample data be one urgently Problem to be solved.In the prior art, the data mark of training sample, is mainly manually marked by limited professional, is produced Efficiency is lower out.
Summary of the invention
The embodiment of the present invention provides a kind of data mask method, device, electronic equipment and storage medium, to solve existing skill The low problem of data annotating efficiency in art.
Specific technical solution provided in an embodiment of the present invention is as follows:
One embodiment of the invention provides a kind of data mask method, comprising:
Obtain the annotation results for at least two mark subtasks that data mark task includes;Wherein, described at least two Subtask is marked to carry out in corresponding task execution terminal respectively;
Based on the annotation results of at least two mark subtask, the data mark knot of the data mark task is determined Fruit.
In conjunction with one embodiment of the present of invention, the task execution terminal include user annotation worker tasks execute terminal and Profession mark worker tasks execute terminal;At least two mark corresponding task execution terminal in subtask is appointed according to mark The Capability Requirement and/or mark cost of business determine.
In conjunction with one embodiment of the present of invention, it is described at least two mark subtask include at least first mark subtask and Second mark subtask;First mark subtask is that the positive erroneous judgement of data marks task, and the second mark subtask is error in data Reason marks task and/or the data that right a wrong mark task;Described first, which marks the corresponding task execution terminal in subtask, is User annotation worker tasks execute terminal, and the second mark corresponding task execution terminal in subtask is that professional mark personnel appoint Business executes terminal.
In conjunction with one embodiment of the present of invention, the task execution terminal includes that different type mark worker tasks execute end End;At least two mark corresponding task execution terminal in subtask is determined according to the type of mark subtask.
In conjunction with one embodiment of the present of invention, each mark subtask is carried out in one or more task execution terminals.
In conjunction with one embodiment of the present of invention, carried out if a mark subtask executes terminal in multiple tasks, using such as Under type determines the annotation results of the mark subtask:
It obtains the mark subtask and executes multiple annotation results that terminal is obtained in multiple tasks;
If the multiple annotation results are consistent, determine that the annotation results of the mark subtask are the consistent annotation results.
In conjunction with one embodiment of the present of invention, the data mark task is that voice-related data marks task.
Another embodiment of the present invention provides a kind of data annotation equipment, comprising:
Module is obtained, for obtaining the annotation results for at least two mark subtasks that data mark task includes;Wherein, At least two mark subtask is carried out in corresponding task execution terminal respectively;
Determining module determines that the data mark is appointed for the annotation results based at least two mark subtask The data annotation results of business.
In conjunction with another embodiment of the invention, the task execution terminal includes that user annotation worker tasks execute terminal Mark worker tasks execute terminal with profession;At least two mark corresponding task execution terminal in subtask is according to mark The Capability Requirement and/or mark cost of task determine.
In conjunction with another embodiment of the invention, at least two mark subtask includes at least the first mark subtask With the second mark subtask;First mark subtask is that the positive erroneous judgement of data marks task, and the second mark subtask is that data are wrong Accidentally reason mark task and/or the data that right a wrong mark task;The first mark corresponding task execution terminal in subtask Terminal is executed for user annotation worker tasks, the second mark corresponding task execution terminal in subtask is professional mark personnel Task execution terminal.
In conjunction with another embodiment of the invention, the task execution terminal includes that different type mark worker tasks execute Terminal;At least two mark corresponding task execution terminal in subtask is determined according to the type of mark subtask.
In conjunction with another embodiment of the invention, each mark subtask is carried out in one or more task execution terminals.
In conjunction with another embodiment of the invention, carries out, use if a mark subtask executes terminal in multiple tasks As under type determines the annotation results of the mark subtask:
The acquisition module, specifically for obtain the mark subtask multiple tasks execute terminal obtained it is multiple Annotation results;
If the determining module determines the mark knot of the mark subtask consistent specifically for the multiple annotation results Fruit is the consistent annotation results.
In conjunction with another embodiment of the invention, the data mark task is that voice-related data marks task.
Another embodiment of the present invention provides a kind of electronic equipment, comprising:
At least one processor, for storing program instruction;
At least one processor, for calling the program instruction stored in the memory, according to the program instruction of acquisition Execute any of the above-described kind of data mask method.
Another embodiment of the present invention provides a kind of computer readable storage medium, is stored thereon with computer program, The computer program realizes the step of any of the above-described kind of data mask method when being executed by processor.
In the embodiment of the present invention, the annotation results for at least two mark subtasks that data mark task includes are obtained;Its In, at least two mark subtask is carried out in corresponding task execution terminal respectively;Based at least two marks The annotation results of task determine the data annotation results of the data mark task, in this way, can be by a complicated data mark Note task is divided into several small mark subtasks, executes respectively in corresponding task execution terminal, and each subtask that marks can be with It is executed in the task execution terminal being more suitable for, the accuracy and efficiency to each mark subtask mark can be improved, reduce cost, To determine the data annotation results of data mark task, improve mark according to the annotation results of at least two mark subtasks Efficiency is infused, solves and is both needed to manually be marked by professional in the prior art, the lower problem of efficiency.
Detailed description of the invention
Fig. 1 is data mask method flow chart in the embodiment of the present invention;
Fig. 2 is that the first user oriented front-end interface in mark subtask realizes schematic diagram in the embodiment of the present invention;
Fig. 3 is data annotation equipment structural schematic diagram in the embodiment of the present invention;
Fig. 4 is electronic devices structure schematic diagram in the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, is not whole embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Currently, the realization of the function or service of usually each system platform requires the support of mass data, with training For sample data, in training pattern, need using a large amount of training samples, to improve the accuracy of model training, but before It is a urgent problem to be solved that phase, which obtains training sample, and the data mark of training sample is mainly by limited special in the prior art Industry personnel are manually labeled, and annotating efficiency is lower, and the training samples number of acquisition is also less.
Based in this embodiment of the present invention, since a usual data mark task, several mark links can be divided into, it can Can some mark links are professional requires lower, some are professional more demanding, therefore, in the embodiment of the present invention, data mark Task may include at least two mark subtasks, this at least two mark subtask respectively corresponding task execution terminal into Some of them mark subtask is distributed to the task execution terminal on line for example, crowdsourcing task platform can be based on by row, by The user of each task execution terminal is labeled on line, efficiency can be improved, and task execution terminal can be each on line User terminal, or efficiency and accuracy not only can be improved in background server or computer etc., and also achieving can be by Different task execution terminals targetedly handles mark subtask, reduces cost.
Wherein, crowdsourcing refers to the task that a company or mechanism are executed the past by employee, with freely voluntary Form is contracted out to unspecific, and is usually the mode of large-scale public network, crowdsourcing task can be passed through crowd by party in request Packet platform is published on internet, completes crowdsourcing task by the user from internet, and after user's completion crowdsourcing task also Certain reward can be obtained.
Crowdsourcing task platform is based primarily upon in the embodiment of the present invention, for the training sample for the model that artificial intelligence function is realized This data mark, the model of model for example, identification user instruction, the model answered a question, the model etc. of search result, when So it is not limited in artificial intelligence field and crowdsourcing task platform, merely to clearly illustrating this in the embodiment of the present invention Technical solution in inventive embodiments, for other service applications, technical solution provided in an embodiment of the present invention is for similar The problem of, it is equally applicable.
As shown in fig.1, for data mask method flow chart in the embodiment of the present invention, this method comprises:
Step 100: obtaining the annotation results for at least two mark subtasks that data mark task includes;Wherein, at least Two mark subtasks are carried out in corresponding task execution terminal respectively.
Wherein, data mark task can mark task for crowdsourcing data, be of course not solely limited to crowdsourcing task.
Also, the data, which mark task, can mark task for voice-related data, for example, phonetic search mark task, Voice-enabled chat marks task dispatching, and voice-related data mark task is also not limited in certain embodiment of the present invention.
Further, marking in the embodiment of the present invention in subtask can also include corresponding data to be marked, mark side Formula etc., wherein notation methods are the labeling operation for indicating task execution terminal, for example, the positive erroneous judgement mark of data, error in data Reason mark etc..
In the embodiment of the present invention, it may include at least two mark subtasks that data, which mark task, at least two mark Subtask can be it is associated, be also possible to it is mutually independent, specifically at least two mark subtasks can use with lower section Formula determines:
It first way: can correspond to different according to notation methods in data mark task, different mark subtasks Notation methods.
The second way: each execution link of task is marked according to data, each mark subtask correspondence respectively can be independent Execute link.
The third mode: the mark work requirements of task can also be manually marked according to data, for example, Capability Requirement, mark Form etc., a data mark task is disassembled as several small mark subtasks.
Certainly, it is not limited in above-mentioned several embodiments in the embodiment of the present invention, for other feasible by data mark The mode that note task is divided at least two mark subtasks also should belong to the protection scope of the embodiment of the present invention.
Further, in the embodiment of the present invention each mark subtask can one or more task execution terminals into Row.
It when thening follow the steps 100, specifically includes: carrying out, use if a mark subtask executes terminal in multiple tasks As under type determines the annotation results of the mark subtask:
It obtains the mark subtask and executes multiple annotation results that terminal is obtained in multiple tasks, if multiple marks are tied Fruit is consistent, determines that the annotation results of the mark subtask are the consistent annotation results.
It is carried out in this way, executing terminal by multiple tasks, mark accuracy can be improved, for example, the mark subtask is two A task execution terminal carries out, if obtained annotation results are misjudgment, can determine the mark of the mark subtask It as a result is misjudgment.
Further, it is carried out if a mark subtask executes terminal in multiple tasks, obtained multiple annotation results are not Unanimously, it may be considered that the data of the mark subtask are disabled, professional can also be transferred to be labeled again.
Step 110: the annotation results based at least two mark subtasks determine the data mark knot of data mark task Fruit.
In this way, the annotation results of each mark subtask based on data mark task, finally to determine that data mark is appointed The annotation results of each mark subtask can be integrated, combine, respectively mark subtask by the data annotation results of business It can be marked by different task execution terminals, it is more efficient, accuracy is also improved, cost is reduced.
Below in step 100, at least two mark subtasks are carried out in corresponding task execution terminal respectively, specifically It is illustrated.
Wherein, task execution terminal can be user terminal, or background server or computer etc., the present invention are real It applies in example and is not limited, can specifically be divided into following several situations:
1) the first situation: task execution terminal includes that user annotation worker tasks execute terminal and professional mark personnel times Business executes terminal;At least two mark the corresponding task execution terminals in subtask according to mark subtask Capability Requirement and/or Cost is marked to determine.
For example, it is user terminal that user annotation worker tasks, which execute terminal, after profession mark worker tasks' execution terminal is Platform server or computer etc..
That is, efficiency is very low due to transferring to professional mark personnel to be labeled entirely in the prior art, therefore, this Mark subtask can be transferred to not respectively according to the Capability Requirement and/or mark cost of mark subtask in inventive embodiments Same task execution terminal, for example, since user annotation personnel are usually ordinary user, and personnel amount is also relatively more, because This, can transfer to user annotation worker tasks to execute end mark subtask some professional of less demanding, fairly simple End, can be improved efficiency, hands over height for some professional requirements, marks relatively difficult mark subtask and transfer to professional mark again Worker tasks execute terminal, can also be improved accuracy, and then efficiency not only can be improved on the whole, it can also be ensured that accurate Property.
Specifically, if at least two marks subtask includes at least the first mark subtask and the second mark subtask; First mark subtask is that the positive erroneous judgement of data marks task, the second mark subtask be error in data reason mark task and/ Or the data mark task that rights a wrong, then the first mark corresponding task execution terminal in subtask is that user annotation worker tasks hold Row terminal, the second mark corresponding task execution terminal in subtask are that profession mark worker tasks execute terminal.
This is because the positive erroneous judgement mark task of data is usually fairly simple judgement task, it is only necessary to judge to mistake , such as one section of voice and corresponding text are provided, judge whether text is consistent with voice, mark is fairly simple, therefore can It is carried out with executing terminal in user annotation worker tasks, for example, the first mark subtask is distributed to respectively by crowdsourcing platform User terminal can be improved efficiency greatly, mark task for error in data reason and/or the data that right a wrong mark is appointed Business, professional to require relatively high, such as error reason may be for tone, pronunciation etc., therefore, this second marks subtask in profession It marks worker tasks and executes terminal progress, it is also ensured that accuracy.
In this way, a data mark task may include at least two mark subtasks, and respectively in the embodiment of the present invention It is carried out in corresponding task execution terminal, for example, being distributed on line for user on crowdsourcing platform middle line being capable of handling is used Family, what user cannot be handled on line transfers to internal professional mark personnel to handle, and efficiency not only can be improved, can also pass through The annotation results accuracy of each mark subtask is improved, to improve the accuracy of overall data mark task, and can be reduced The cost of user on line reduces the cost of the operations such as subsequent screening, amendment.
Further, in the embodiment of the present invention, the first mark subtask can be two with the second mark subtask and be associated with It is different execute links, the second mark subtask can be generated according to the annotation results of the first mark subtask, may include Two ways:
First way: it if the first mark subtask is carried out in a task execution terminal, obtains the first mark and appoints First annotation results of business judge whether to need to continue to mark according to the first annotation results, however, it is determined that need, then according to the first mark The data for needing to continue mark in note subtask generate the second mark subtask.
The second way: carrying out if the first mark subtask executes terminal in multiple tasks, obtains the first mark and appoints Business the first annotation results, compare multiple tasks execute terminal to first mark subtask the first annotation results it is whether identical, If it is determined that identical, and if further need to continue to mark according to the judgement of the first annotation results, according in the first mark subtask It needs to continue the data of mark, generates the second mark subtask.
For example, if the first mark subtask is that the positive erroneous judgement of data marks task, if the first annotation results be it is wrong, It then determines and needs to continue to mark, be the data of mistake according to the first annotation results in the first mark subtask, generate the second mark Subtask.
Wherein, it is to be determined according to the grade of the corresponding user of multiple tasks execution terminal that multiple tasks, which execute the number of terminal, Out.
For example, the grade of user is divided into two grades of A, B and C, A class user marks accuracy and is greater than B class user, B Class user is greater than C class user, and the annotation results that can set an A class user can be considered reliably, two B The annotation results of class user are identical just to think that the annotation results are reliable, the identical ability of the annotation results of three C class users Think that the annotation results are reliable.
Then it needs to be determined that the first annotation results of the task execution terminal of an A class user, determine two B class users Task execution terminal the first annotation results it is identical, determine the task execution terminal of three C class users first mark knot Fruit is identical.
2) second situation: task execution terminal includes that different type mark worker tasks execute terminal;At least two marks The corresponding task execution terminal in subtask is infused to be determined according to the type of mark subtask.
Wherein, different type mark worker tasks execute terminal, such as may include that ordinary user mark worker tasks hold Row terminal, professional user mark worker tasks and execute terminal etc., and it can all be to use that different type, which marks worker tasks and executes terminal, Family terminal.
Specifically, according to the type and the corresponding mark personnel class of task execution terminal of at least two mark subtasks Type determines the corresponding task execution terminal at least two mark subtasks.
Wherein, mark personnel's type can be determined according to mark personnel's attribute information, such as the user's base filled in when registration This information may include study profession, career field, speciality etc., so as to which different type is marked subtask, be distributed to not Same type marks worker tasks and executes terminal, and mark subtask can be distributed to being more suitable for completing the mark subtask of the task The mark personnel for executing terminal, improve the accuracy of mark, can also reduce the cost to annotation results screening.
For example, data mark task includes the first mark subtask and the second mark subtask, the first mark subtask is special Industry requirement is lower, can complete to the whole network user in crowdsourcing platform, and the second mark subtask is professional more demanding, if It distributes to the unfamiliar user of the profession, annotation results accuracy can be relatively low, and possible major part annotation results are all unavailable, Cause it is subsequent it is possible need to mark again again, higher cost in the embodiment of the present invention, can determine the type of mark personnel, root According to the type of mark personnel and the type of the two mark subtasks, the two mark subtasks are distributed to different appoint Business executes the mark personnel of terminal, for example, the first mark subtask to be distributed to all users of crowdsourcing platform, by the second mark Subtask is distributed to the highly professional user determined in crowdsourcing platform.
In this way, artificial intelligence field is directed to, since data volume is larger, if data mark task is integrally distributed to crowdsourcing All users in platform, for professional more demanding part, annotation results accuracy can be relatively low, may be unavailable, after Continuous screening and modification also need to pay corresponding remuneration for user in crowdsourcing platform and encourage than relatively time-consuming, higher cost It encourages, cost can be further increased, therefore, data mark task is divided at least two mark subtasks, is being more suitable for respectively The task execution terminal of mark personnel carries out, and accuracy not only can be improved, can also reduce cost.
It further, is the reliability for improving the distribution of mark subtask, it can also be according to the behavioural information of mark personnel, really Calibrate note personnel grade, at least two mark the corresponding task execution terminals in subtask according to mark subtask type, with And the grade of mark personnel and/or the type of mark personnel determine.
Wherein, it according to the behavioural information of mark personnel, determines the grade of mark personnel, specifically includes: can be according to default The case where personnel execute task is marked in historical time section, to determine the grade of mark personnel, for example, can be according to the task of execution Accuracy, the quantity etc. that executes task be determined, the grade for marking personnel can characterize the accuracy that user executes task And reliability.
Based on the above embodiment, further, after executing step 110, one kind is additionally provided in the embodiment of the present invention can The embodiment of energy marks the data annotation results of task according to data, determines training sample data.
Specifically: if at least two mark subtasks include at least the first mark subtask and the second mark subtask;The One mark subtask is that the positive erroneous judgement of data marks task, the second mark subtask be error in data reason mark task and/or The data that right a wrong mark task, then can mark the first annotation results in subtask for first is correct data, Yi Ji The data of the second annotation results are carried in two mark subtasks, i.e., after carrying error in data reason and/or the data that right a wrong Data, as training sample data.
That is, the first annotation results are correct data in the embodiment of the present invention, it can be directly as training sample Data are the data of mistake for the first annotation results, can be used as the data to be marked of the second mark subtask, user is directed to After this partial data is labeled, upper second annotation results are carried, this partial data can also be used as training sample data, carry out When model training, correct data can be not only known, can also know that the error reason of the data of mistake is, Ke Yiti Height training accuracy and efficiency.
Above-described embodiment is further described using concrete application scene below.Using data mark task as crowdsourcing number According to mark task, the first mark subtask is that the positive erroneous judgement of data marks task, and the second mark subtask is error in data reason Mark task and/or the data that right a wrong mark task, and the first mark corresponding task execution terminal in subtask is user It marks worker tasks and executes terminal, the second mark corresponding task execution terminal in subtask is that profession mark worker tasks execute end For end.
For example, being directed to certain artificial smart machine, the realization of each function requires to be tested repeatedly, such as voice is known Other function, function of search etc. need to carry out model training by a large amount of training sample data, to improve accuracy, for instruction The acquisition for practicing sample data can be realized by crowdsourcing task platform, execute mark by user a large amount of in crowdsourcing task platform Note task, wherein each function of artificial intelligence equipment realizes that corresponding data mark task has very much, such as voice recognition tasks Mark, parsing task mark, search mission mark, chat task mark etc., it is only right by taking voice recognition tasks mark as an example here The embodiment of the present invention is illustrated.
Voice recognition tasks mark, marks job description are as follows: is labeled to whether the text that audio identification goes out corresponds to, no Corresponding marking error reason.
Firstly, the voice recognition tasks can be divided into the first mark subtask and the second mark subtask, wherein the first mark Infusing subtask is that the positive erroneous judgement of data marks task, and the second mark subtask is that error in data reason marks task.
It is then possible to which the first mark subtask is published on crowdsourcing task platform line, it is distributed in crowdsourcing task platform All user annotation worker tasks execute terminal.
As shown in fig.2, the first user oriented front-end interface in mark subtask realizes signal in the embodiment of the present invention Figure.As shown in Fig. 2, audio playing function may be implemented, user can be broadcast by clicking beginning/pause button that audio plays Playback frequency, also needs setting text box, includes the text that the audio identification goes out in text frame, and correct corresponding with wrong choice Select button, for example, being correctly " √ ", mistake is "×", and user can be by playing audio, and text be in judgement and text box It is no consistent, to select correctly or incorrectly.
Finally, obtaining each user annotation worker tasks in crowdsourcing task platform executes the first mark subtask that terminal returns The first annotation results, according to first mark subtask in the first annotation results be mistake data, generate second mark appoint Business, the second mark subtask are not published on line, are executed terminal by profession mark worker tasks and are carried out, and internal profession mark is transferred to Personnel's mark obtains the second annotation results that profession mark worker tasks execute the second mark subtask that terminal returns.
At this moment, the first annotation results are wrong data in the first mark subtask, and data volume will be greatly reduced, transfer to Internal profession mark personnel do error reason mark, can reduce cost, and accuracy is higher, and, it is only necessary to pay crowdsourcing Each user annotation worker tasks execute terminal and reward the remuneration of the first mark subtask in task platform, judge to mistake, task Fairly simple, remuneration reward is also relatively fewer, accordingly it is also possible to reduce crowdsourcing task platform payment cost.
Based on the above embodiment, as shown in fig.3, for the data annotation equipment structural schematic diagram in the embodiment of the present invention, It specifically includes:
Module 30 is obtained, for obtaining the annotation results for at least two mark subtasks that data mark task includes;Its In, at least two mark subtask is carried out in corresponding task execution terminal respectively;
Determining module 31 determines the data mark for the annotation results based at least two mark subtask The data annotation results of task.
Optionally, the task execution terminal includes that user annotation worker tasks execute terminal and profession mark worker tasks Execute terminal;It is described at least two mark the corresponding task execution terminal in subtask according to mark subtask Capability Requirement and/ Or mark cost determines.
Optionally, at least two mark subtask includes at least the first mark subtask and the second mark subtask; First mark subtask is that the positive erroneous judgement of data marks task, the second mark subtask be error in data reason mark task and/ Or the data mark task that rights a wrong;The first mark corresponding task execution terminal in subtask is user annotation worker tasks Terminal is executed, the second mark corresponding task execution terminal in subtask is that profession mark worker tasks execute terminal.
Optionally, the task execution terminal includes that different type mark worker tasks execute terminal;Described at least two The corresponding task execution terminal in subtask is marked to be determined according to the type of mark subtask.
Optionally, each mark subtask is carried out in one or more task execution terminals.
Optionally, it is carried out if a mark subtask executes terminal in multiple tasks, determines the mark in the following way The annotation results of subtask:
The acquisition module 30, specifically for obtain the mark subtask multiple tasks execute terminal obtained it is more A annotation results;
If the determining module 31 determines the mark of the mark subtask consistent specifically for the multiple annotation results It as a result is the consistent annotation results.
Optionally, the data mark task is that voice-related data marks task.
Based on the above embodiment, as shown in fig.4, in the embodiment of the present invention, the structural schematic diagram of a kind of electronic equipment.
The embodiment of the invention provides a kind of electronic equipment, which may include 410 (Center of processor Processing Unit, CPU), memory 420, input equipment 430 and output equipment 440 etc., input equipment 430 may include Keyboard, mouse, touch screen etc., output equipment 440 may include display equipment, such as liquid crystal display (Liquid Crystal Display, LCD), cathode-ray tube (Cathode Ray Tube, CRT) etc..
Memory 420 may include read-only memory (ROM) and random access memory (RAM), and mention to processor 410 For the program instruction and data stored in memory 420.In embodiments of the present invention, memory 420 can be used for storing data The program of mask method.
Processor 410 is by the program instruction for calling memory 420 to store, and processor 410 is for the program according to acquisition Instruction execution:
Obtain the annotation results for at least two mark subtasks that data mark task includes;Wherein, described at least two Subtask is marked to carry out in corresponding task execution terminal respectively;
Based on the annotation results of at least two mark subtask, the data mark knot of the data mark task is determined Fruit.
Optionally, the task execution terminal includes that user annotation worker tasks execute terminal and profession mark worker tasks Execute terminal;It is described at least two mark the corresponding task execution terminal in subtask according to mark subtask Capability Requirement and/ Or mark cost determines.
Optionally, at least two mark subtask includes at least the first mark subtask and the second mark subtask; First mark subtask is that the positive erroneous judgement of data marks task, the second mark subtask be error in data reason mark task and/ Or the data mark task that rights a wrong;The first mark corresponding task execution terminal in subtask is user annotation worker tasks Terminal is executed, the second mark corresponding task execution terminal in subtask is that profession mark worker tasks execute terminal.
Optionally, the task execution terminal includes that different type mark worker tasks execute terminal;Described at least two The corresponding task execution terminal in subtask is marked to be determined according to the type of mark subtask.
Optionally, each mark subtask is carried out in one or more task execution terminals.
Optionally, it is carried out if a mark subtask executes terminal in multiple tasks, determines the mark in the following way The annotation results of subtask: processor 410 is specifically used for:
It obtains the mark subtask and executes multiple annotation results that terminal is obtained in multiple tasks;
If the multiple annotation results are consistent, determine that the annotation results of the mark subtask are the consistent annotation results.
Optionally, the data mark task is that voice-related data marks task.
Based on the above embodiment, in the embodiment of the present invention, a kind of computer readable storage medium is provided, is stored thereon with Computer program, the computer program realize the data mark side in above-mentioned any means embodiment when being executed by processor Method.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of data mask method characterized by comprising
Obtain the annotation results for at least two mark subtasks that data mark task includes;Wherein, at least two mark Subtask is carried out in corresponding task execution terminal respectively;
Based on the annotation results of at least two mark subtask, the data annotation results of the data mark task are determined.
2. the method as described in claim 1, which is characterized in that the task execution terminal includes that user annotation worker tasks hold Row terminal and profession mark worker tasks execute terminal;It is described at least two mark the corresponding task execution terminal in subtask according to The Capability Requirement and/or mark cost for marking subtask determine.
3. method according to claim 2, which is characterized in that at least two mark subtask includes at least the first mark Subtask and the second mark subtask;First mark subtask is that the positive erroneous judgement of data marks task, and the second mark subtask is Error in data reason marks task and/or the data that right a wrong mark task;The first mark corresponding task in subtask is held Row terminal is that user annotation worker tasks execute terminal, and the second mark corresponding task execution terminal in subtask is profession mark It infuses worker tasks and executes terminal.
4. the method as described in claim 1, which is characterized in that the task execution terminal includes that different type mark personnel appoint Business executes terminal;At least two mark corresponding task execution terminal in subtask is determined according to the type of mark subtask.
5. the method as described in claim 1, which is characterized in that each mark subtask is in one or more task execution terminals It carries out.
6. method as claimed in claim 5, which is characterized in that if a mark subtask multiple tasks execute terminal into Row, determines the annotation results of the mark subtask in the following way:
It obtains the mark subtask and executes multiple annotation results that terminal is obtained in multiple tasks;
If the multiple annotation results are consistent, determine that the annotation results of the mark subtask are the consistent annotation results.
7. the method as described in claim 1-6 is any, which is characterized in that the data mark task is voice-related data mark Note task.
8. a kind of data annotation equipment characterized by comprising
Module is obtained, for obtaining the annotation results for at least two mark subtasks that data mark task includes;Wherein, described At least two mark subtasks are carried out in corresponding task execution terminal respectively;
Determining module determines the data mark task for the annotation results based at least two mark subtask Data annotation results.
9. a kind of electronic equipment characterized by comprising
At least one processor, for storing program instruction;
At least one processor is executed for calling the program instruction stored in the memory according to the program instruction of acquisition The described in any item methods of the claims 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program It is realized when being executed by processor such as the step of any one of claim 1-7 the method.
CN201810786768.4A 2018-07-17 2018-07-17 A kind of data mask method, device, electronic equipment and storage medium Pending CN108984490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810786768.4A CN108984490A (en) 2018-07-17 2018-07-17 A kind of data mask method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810786768.4A CN108984490A (en) 2018-07-17 2018-07-17 A kind of data mask method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108984490A true CN108984490A (en) 2018-12-11

Family

ID=64548370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810786768.4A Pending CN108984490A (en) 2018-07-17 2018-07-17 A kind of data mask method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108984490A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710933A (en) * 2018-12-25 2019-05-03 广州天鹏计算机科技有限公司 Acquisition methods, device, computer equipment and the storage medium of training corpus
CN111160044A (en) * 2019-12-31 2020-05-15 出门问问信息科技有限公司 Text-to-speech conversion method and device, terminal and computer readable storage medium
CN112749308A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Data labeling method and device and electronic equipment
CN112884303A (en) * 2021-02-02 2021-06-01 深圳市欢太科技有限公司 Data annotation method and device, electronic equipment and computer readable storage medium
CN113421591A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Voice labeling method, device, equipment and storage medium
CN113539446A (en) * 2020-04-21 2021-10-22 杭州普健医疗科技有限公司 CT image labeling method and system, storage medium and terminal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130081053A1 (en) * 2011-09-23 2013-03-28 Elwha LLC, a limited liability company of the State of Delaware Acquiring and transmitting tasks and subtasks to interface devices
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN107705034A (en) * 2017-10-26 2018-02-16 医渡云(北京)技术有限公司 Mass-rent platform implementation method and device, storage medium and electronic equipment
CN107729378A (en) * 2017-07-13 2018-02-23 华中科技大学 A kind of data mask method
CN108268575A (en) * 2017-01-04 2018-07-10 阿里巴巴集团控股有限公司 Processing method, the device and system of markup information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130081053A1 (en) * 2011-09-23 2013-03-28 Elwha LLC, a limited liability company of the State of Delaware Acquiring and transmitting tasks and subtasks to interface devices
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
CN103530282A (en) * 2013-10-23 2014-01-22 北京紫冬锐意语音科技有限公司 Corpus tagging method and equipment
CN108268575A (en) * 2017-01-04 2018-07-10 阿里巴巴集团控股有限公司 Processing method, the device and system of markup information
CN107729378A (en) * 2017-07-13 2018-02-23 华中科技大学 A kind of data mask method
CN107705034A (en) * 2017-10-26 2018-02-16 医渡云(北京)技术有限公司 Mass-rent platform implementation method and device, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
仝子飞: "通用众包标注系统的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
孙欢: "众包标注的学习算法研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710933A (en) * 2018-12-25 2019-05-03 广州天鹏计算机科技有限公司 Acquisition methods, device, computer equipment and the storage medium of training corpus
CN112749308A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Data labeling method and device and electronic equipment
CN111160044A (en) * 2019-12-31 2020-05-15 出门问问信息科技有限公司 Text-to-speech conversion method and device, terminal and computer readable storage medium
CN113539446A (en) * 2020-04-21 2021-10-22 杭州普健医疗科技有限公司 CT image labeling method and system, storage medium and terminal
CN112884303A (en) * 2021-02-02 2021-06-01 深圳市欢太科技有限公司 Data annotation method and device, electronic equipment and computer readable storage medium
CN113421591A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Voice labeling method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108984490A (en) A kind of data mask method, device, electronic equipment and storage medium
CN107833060A (en) The verification method and system of intelligent contract transaction in a kind of block chain
CN107644286A (en) Workflow processing method and device
CN110221904A (en) A kind of Business Process Control method
WO2017000743A1 (en) Method and device for software recommendation
CN107506190A (en) XML file amending method and device based on Spring frameworks
CN110377621A (en) A kind of interface processing method and device based on computing engines
CN110727782A (en) Question and answer corpus generation method and system
CN106095948A (en) The querying method of form, device and equipment
CN105989438A (en) Task relation management method, apparatus and system thereof, and electronic equipment
CN109408669A (en) A kind of content auditing method and device for different application scene
CN108776686A (en) Data tag construction system and method
CN108959404A (en) Intelligence questions closely method and device
CN112036843A (en) Flow element positioning method, device, equipment and medium based on RPA and AI
CN106294630B (en) Multimedia messages recommended method, device and multimedia system
CN110689285A (en) Test method, test device, electronic equipment and computer readable storage medium
CN106779404A (en) A kind of pre- alarm method of flow and device
CN106021415A (en) Data check method and system
CN107306209A (en) A kind of information acquisition method and device
CN110032750A (en) A kind of model construction, data life period prediction technique, device and equipment
CN109840072A (en) Information processing method and device
CN110083351A (en) Method and apparatus for generating code
CN108153678A (en) A kind of test assignment processing method and processing device
CN109741813A (en) A kind of smart shift scheduling method, system, computer equipment and readable storage medium storing program for executing
Ashri et al. Defining an AI strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211

RJ01 Rejection of invention patent application after publication