CN105404896A - Annotation data processing method and annotation data processing system - Google Patents

Annotation data processing method and annotation data processing system Download PDF

Info

Publication number
CN105404896A
CN105404896A CN201510744484.5A CN201510744484A CN105404896A CN 105404896 A CN105404896 A CN 105404896A CN 201510744484 A CN201510744484 A CN 201510744484A CN 105404896 A CN105404896 A CN 105404896A
Authority
CN
China
Prior art keywords
mark
annotation results
selective examination
similarity
labeled data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510744484.5A
Other languages
Chinese (zh)
Other versions
CN105404896B (en
Inventor
许欣然
高宇
印奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Beijing Aperture Science and Technology Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Beijing Aperture Science and Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd, Beijing Aperture Science and Technology Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201510744484.5A priority Critical patent/CN105404896B/en
Publication of CN105404896A publication Critical patent/CN105404896A/en
Application granted granted Critical
Publication of CN105404896B publication Critical patent/CN105404896B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an annotation data processing method and an annotation data processing system. The annotation data processing method comprises the steps that step S110: similarity of multiple annotation results related to annotation tasks is calculated; step S120: the similarity is compared with a similarity threshold, the process goes to step S130 if the similarity is greater than or equal to the similarity threshold, and the process goes to step S140 if the similarity is less than the similarity threshold; step S130: a situation that multiple annotation results pass quality detection is determined; and step S140: a situation that multiple annotation results do not pass quality detection is determined. According to the annotation data processing method and the annotation data processing system, the quality of the annotation results is automatically detected by utilizing the similarity so that annotation staff are enabled to possibly obtain the quality of the annotation results timely and then possibly correct annotation errors timely, and thus annotation accuracy can be effectively enhanced.

Description

Labeled data disposal route and labeled data disposal system
Technical field
The present invention relates to data processing field, be specifically related to a kind of labeled data disposal route and labeled data disposal system.
Background technology
Machine carries out training (or saying study) usually to need a large amount of labeled data as training set, the larger effect to training of data volume of labeled data is more helpful, therefore how to carry out data mark efficiently and accurately and has become a problem demanding prompt solution.The data mark flow process of existing data labeling system is generally: issue comprise one or more mark unit mark task, manually mark, carry out manual quality's inspection etc.Existing data labeling system relies on manual quality to check to control to mark accuracy completely, therefore from manually the mark time interval be accomplished between quality check may be very long, is difficult to the mistake of correcting mark personnel in time.
Summary of the invention
In view of the above problems, the present invention is proposed to provide a kind of labeled data disposal route of solving the problem at least in part and labeled data disposal system.
According to an aspect of the present invention, provide a kind of labeled data disposal route, comprising: step S110: the similarity calculating the multiple annotation results relevant to mark task; Step S120: by similarity compared with similarity threshold, if similarity is more than or equal to similarity threshold, then goes to step S130, if similarity is less than similarity threshold, then goes to step S140; Step S130: determine that multiple annotation results passes through quality testing; And step S140: determine that multiple annotation results does not pass through quality testing.
According to a further aspect in the invention, provide a kind of labeled data disposal system, comprise calculation element, similarity-rough set device, the first actuating unit and the second actuating unit.Calculation element is for calculating the similarity of the multiple annotation results relevant to mark task.Similarity-rough set device is used for similarity compared with similarity threshold, if similarity is more than or equal to similarity threshold, then starts the first actuating unit, if similarity is less than similarity threshold, then starts the second actuating unit.First actuating unit is used for determining that multiple annotation results passes through quality testing.Second actuating unit is used for determining that multiple annotation results does not pass through quality testing.
According to labeled data disposal route provided by the invention and labeled data disposal system, due to the quality utilizing similarity automatically to detect annotation results, therefore mark personnel are made likely to know the quality of annotation results in time, and then likely correct marking error in time, can effectively improve mark accuracy.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 illustrates the process flow diagram of labeled data disposal route according to an embodiment of the invention;
Fig. 2 illustrates the process flow diagram of labeled data disposal route in accordance with another embodiment of the present invention;
Fig. 3 illustrates the process flow diagram of labeled data disposal route in accordance with another embodiment of the present invention; And
Fig. 4 illustrates the schematic block diagram of labeled data disposal system according to an embodiment of the invention.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
According to an aspect of the present invention, a kind of labeled data disposal route is provided.Fig. 1 shows the process flow diagram of labeled data disposal route 100 according to an embodiment of the invention.
As shown in Figure 1, labeled data disposal route 100 comprises the following steps.
Step S110: the similarity calculating the multiple annotation results relevant to mark task.Mark task as herein described refers to the task of comprising object to be marked and mark requirement.Object to be marked also can be called " mark unit ", and it can be one or one group of image, video or audio frequency etc.Mark requires it is the information how indicateing arm note personnel mark mark unit.Such as, mark unit can be an image comprising some faces, and mark requirement can be that instruction frame is published picture all faces in picture or mark out the information of several key points on face etc.Mark personnel, after accepting mark task, can go out face or mark face key point etc. by frame in the picture.The image of face comprised through mark can be used in during the machine relevant to all kinds of face recognition application train.Certainly, the above example being only the mark unit in mark task and requiring with mark, it is limitation of the present invention not.Other several examples of mark unit and mark requirement are described below.Mark unit can also be the image comprising other guide, such as, be the image comprising the contents such as word (trademark or the number-plate number etc.), animal, article.Correspondingly, marking requirement can be the information that indicateing arm outpours all words in image, animal or article.Mark unit can also be the image comprising personage, and mark requires it is the information that the sex, race of personage in image or age etc. are determined in instruction.Mark unit can also be one group of image, comprise one and comprise the image of object of reference and multiple image to be selected, mark requirement can be the information that specific image to be selected is selected in instruction from all images to be selected, and this specific image to be selected comprises the object identical or close with object of reference.Mark unit can also be a section audio, and mark requirement is the information of the word occurred in indicative input audio frequency.Mark unit can also be a problem and set of option, and mark requires it is the information that the suitable option corresponding with problem is selected in instruction from set of option.
Annotation results refers to that mark personnel require marking the result data obtained after unit marks based on mark.Such as, annotation results can comprise the information of the some face key points marked out about mark personnel, such as each face key point position etc. in the picture.According to embodiments of the invention, a mark task can be accepted by multiple mark personnel and be participated in.Each mark personnel can provide an annotation results for a mark task, therefore, can obtain the multiple annotation results relevant to mark task.Subsequently, the similarity between these annotation results can be calculated.For dissimilar annotation results, the computing method of similarity may be different.Below for the computing method of two annotation results description similarities.A numerical value can be utilized to describe similarity between two annotation results, and it is including but not limited to following several method:
If mark task for mark some points on image, then can calculate the similarity of summation as two annotation results of the Euclidean distance between the corresponding point in two annotation results;
If mark task for mark some polygons on image, then the polygon that can calculate in two annotation results intersect area with mutually and area ratio (IOU) as the similarity of two annotation results;
If mark task be a selection option in multiple option, then two consistent similarities of annotation results are 1, two inconsistent similarities of annotation results is 0;
If mark task is select more than one option in multiple option, then can calculate the similarity of ratio as two annotation results of the quantity of all options in the quantity of public option in two annotation results common factor of option (namely) and two annotation results union of option (namely).
With reference to the description above about the similarity of two annotation results, be appreciated that the account form of the similarity more than the annotation results of two, it can adopt routine techniques to realize, and does not repeat them here.
Step S120: by similarity compared with similarity threshold, if similarity is more than or equal to similarity threshold, then goes to step S130, if similarity is less than similarity threshold, then goes to step S140.Similarity threshold can be any suitable value, and such as, similarity threshold can be more than or equal to 80%, and be such as 85%, 90% or 95% etc., it can be determined as required, and the present invention does not limit this.Similarity threshold can be initially set default value, can automatically adjust according to actual needs subsequently.For different mark tasks, similarity threshold can be identical or different.Such as, for relatively simply marking task, similarity threshold can be comparatively large, and on the contrary, for the mark task of relative complex, similarity threshold can be less.Multiple annotation results of same mark task can be compared, calculate the similarity between them, subsequently, judge whether the similarity between multiple annotation results is more than or equal to above-mentioned similarity threshold, as shown in Figure 1.Select to perform step S130 according to the magnitude relationship between similarity and similarity threshold and still perform step S140.
Step S130: determine that multiple annotation results passes through quality testing.If the similarity of above-mentioned multiple annotation results is more than or equal to similarity threshold, then illustrate that multiple mark personnel are more similar to the mark that same mark unit carries out, therefore this multiple annotation results is all likely annotation results that accuracy is higher.Therefore, this situation can be considered as these annotation results be all to mark unit correct mark, be the correct result of mark task.Like this, can think that the quality of this multiple annotation results can ensure, determine that they pass through quality testing.
Step S140: determine that multiple annotation results does not pass through quality testing.If the similarity between above-mentioned multiple annotation results is less than similarity threshold, then illustrate that the mark difference that multiple mark personnel carry out same mark unit is comparatively large, therefore, concerning this multiple annotation results, the possibility that wherein there is mistake mark is very large.Therefore, this situation can be considered as these annotation results be not to mark unit correct mark, namely think that the quality of this multiple annotation results cannot ensure, determine that they do not pass through quality testing.Be understandable that, quality testing as herein described is the quality being detected annotation results by similarity, it can reflect the quality of annotation results to a certain extent, can also verify the quality of annotation results after quality testing further by manual type.Automatically the mode efficiency being detected the quality of annotation results by similarity is higher, and can improve mark accuracy.
According to labeled data disposal route provided by the invention, due to the quality utilizing similarity automatically to detect annotation results, therefore make mark personnel likely know the quality of annotation results in time, and then likely correct marking error in time, can effectively improve mark accuracy.
Fig. 2 illustrates the process flow diagram of labeled data disposal route 200 in accordance with another embodiment of the present invention.Step S110 shown in Fig. 2, step S120, step S130 are similar with step S140 and Fig. 1, repeat no more.In the present embodiment, before step S110, labeled data disposal route 200 may further include following steps.
Step S102: the annotation results obtaining the given number relevant to mark task.Mark personnel can initiate request by some interactive devices to labeled data disposal system, and labeled data disposal system can be selected to be applicable to the mark task of this mark personnel and this mark task is sent to this mark personnel from safeguarded queue to be marked.Subsequently, the annotation results that mark personnel mark terminates rear acquisition can be obtained.
Step S104: judge if given number equals quantity threshold, then to go to step the quantity threshold whether given number equals relevant to mark task S106, otherwise go to step S102.Different mark tasks can corresponding different quantity threshold, is namely correlated with from different quantity threshold.For relatively simply marking task, not too easily there is mistake when mark personnel mark mark unit, therefore can require that less mark personnel mark.Therefore, the quantity threshold that simple mark task can be corresponding less.On the contrary, for the mark task of complexity, occur when mark personnel mark mark unit that the probability of mistake may be comparatively large, therefore can require that more mark personnel mark.Therefore, the quantity threshold that the mark task of complexity can be corresponding larger.For new mark task, initial quantity threshold can be less, such as, be 2, can adjust quantity threshold according to actual needs subsequently.For a mark task, whenever mark personnel provide an annotation results, just this annotation results can be stored.Can judge whether the number of stored annotation results reaches quantity threshold, does not perform step S110, but re-execute step S102 for the annotation results not reaching quantity threshold, starts to perform step S110 for the annotation results reaching quantity threshold.
Step S106: the annotation results of given number is defined as multiple annotation results and goes to step S110.As mentioned above, when the number of annotation results reaches quantity threshold, namely when accepting and the mark personnel participating in same mark task reach required number, can start to perform step S110.In addition, maintenance quality can also detect queue, the mark task that the number of annotation results can be equaled quantity threshold removes queue to be marked and adds quality testing queue, temporarily no longer sends this mark task to other marks personnel.
Control to accept above by quantity threshold and the method participating in the mark personnel of same mark task can reasonably utilize mark manpower, avoid unnecessary, insignificant mark as far as possible, thus effectively can improve manpower utilization.
Fig. 3 illustrates the process flow diagram of labeled data disposal route 300 in accordance with another embodiment of the present invention.Step S102 shown in Fig. 3, step S104, step S106, step S110, step S120 are similar with step S130 and Fig. 1, repeat no more.In the present embodiment, step S140 can comprise step S1402 and step S1404.Step S1402 is the multiple annotation results of above-mentioned determination and does not pass through quality inspection steps.Except step S1402, step S140 may further include step S1404, namely increases quantity threshold and goes to step S102.
When determining multiple annotation results not by quality testing, can quantity threshold be increased, and obtain more annotation results and be used for carrying out quality testing.The increased number of annotation results, makes the similarity of all annotation results likely increase, and overall quality likely improves.The amount that quantity threshold increases at every turn can be determined as required, and the present invention does not limit this.Be appreciated that at increase quantity threshold and when obtaining new annotation results, mark task can be removed quality testing queue and rejoin queue to be marked.Meanwhile, can introduce new mark personnel, wherein, each mark personnel only mark once.In addition, any one or more that the last time carries out in multiple annotation results of quality testing can also be retained, or can all adopt new annotation results.As mentioned above, for new mark task, quantity threshold can must be less at initial setting, such as, be 2.For relatively simply marking task, likely without the need to increasing quantity threshold or only increasing less quantity threshold namely by quality testing.Therefore for comparatively simply marking task, may be less eventually through the labeled times carried out during quality testing, thus the repeat mark for simple mark task can be reduced as far as possible, save mark manpower.And for the mark task of relative complex, quantity threshold can be increased always, increase the number of the annotation results obtained until by quality testing, like this, the final mass of the annotation results of complicated mark task can be improved, improve mark accuracy.Therefore, according to the method for this automatic adjustment quantity threshold provided by the invention, can try one's best and utilize mark manpower efficiently, can rational many people repeat mark be passed through simultaneously, reach the object promoting mark quality.This unlike the prior art.In existing data mask method, in order to promote mark accuracy, usually need to be multiplied to the labeled times of same mark task.But each mark task is applicable to carrying out how many times mark and cannot determines also cannot carry out Reasonable adjustment, can only estimate according to theoretical or experience.Therefore, this has probably caused for some mark task choosing inappropriate labeled times, thus waste of manpower or reduction mark accuracy.
Alternatively, step S140 may further include: send multiple annotation results, to be checked by supervisory personnel; Check feedback information; And send inspection feedback information, to inform mark personnel.Less in the similarity of multiple annotation results, when making annotation results not by quality testing, can supervisory personnel be sent to check multiple annotation results.Supervisory personnel, after checking annotation results, can provide and check feedback information (namely checking suggestion).Such as, in the mark task that face is marked, the mark personnel likely had mark face in the larger context, some mark personnel mark face in less scope, supervisory personnel can point out that the mark of which mark personnel more meets the requirements and points out that suitable mark scope should be that much, such information checks feedback information exactly.Be appreciated that supervisory personnel also can be considered as being an annotation process to the process that annotation results checks, wherein, mark unit is annotation results, and mark requirement can be such as, point out the unreasonable part of annotation results.Therefore, supervisory personnel also can be considered as mark personnel.By interactive device, inspection feedback information can be fed back to labeled data disposal system, labeled data disposal system will check that feedback information feeds back to the mark personnel of this mark task of all participations, to carry out reference by mark personnel, thus mark personnel are instructed to provide annotation results more accurately.
Alternatively, step S130 may further include: be averaged to multiple annotation results, to obtain the average annotation results relevant to mark task.Average annotation results is used for selective examination.When multiple annotation results is by quality testing, average annotation results can be determined, and average annotation results is stored for being spot-check by selective examination personnel afterwards.Be understandable that, labeled data disposal system can safeguard selective examination queue equally, mark task by quality testing can be removed above-mentioned quality testing queue and store, can selection portion divide mark task to spot-check from stored mark task afterwards.
The determination mode of average annotation results is including but not limited to following several:
If mark task for mark some points on image, then can calculate the intermediate point of the corresponding point in multiple annotation results as average annotation results;
If mark task is that the every two field picture in one section of video marks some points, then can calculate the intermediate point of the corresponding point in the correspondence image in multiple annotation results respectively as average annotation results;
If mark task for mark some polygons on image, then can calculate the intermediate point of the polygonal intersection of correspondence in multiple annotation results or the corresponding point on corresponding polygon as average annotation results, it should be noted that, for the mark task being carried out labelling polygons by the some points on labelling polygons, order may inconsistent (the mark personnel such as had mark the mark of different labeled personnel clockwise, some mark personnel mark counterclockwise), in this case, can first be mapped according to the point on the mark order polygon that each mark personnel are marked out of each mark personnel, calculate average annotation results again,
If mark task is select an option in multiple option, then multiple annotation results should be all consistent, is all unique the correct option, selects this option as average annotation results;
If mark task is select more than one option in multiple option, the public option in multiple annotation results or all options in multiple annotation results then can be selected as average annotation results, be understandable that, select public option the accuracy of final annotation results can be made higher as average annotation results, select all options in multiple annotation results to make the accuracy of final annotation results lower as average annotation results, suitable scheme can be selected as required;
If mark task is input one section of specific word, then can using all words of comprising in multiple annotation results as average annotation results;
If mark task is the personage's the range of age in mark image, then can calculate the common range (i.e. the common factor of the range of age) of the range of age in multiple annotation results or total scope (i.e. the union of the range of age) as average annotation results;
If mark task is facial orientation, number of people orientation or angle etc. in mark image, then can calculate the mean value of the facial orientation in multiple annotation results, number of people orientation or angle etc. as average annotation results.
Alternatively, labeled data disposal route may further include: from mark set of tasks, select mark task subclass; Send the average annotation results relevant to each mark task in mark task subclass, to be spot-check by selective examination personnel; Receive selective examination feedback information; And determine that whether mark set of tasks is by selective examination based on selective examination feedback information.When multiple annotation results is by quality testing, the average annotation results of multiple annotation results and the mark task corresponding with it can be stored.Subsequently, can a collection of mark task groups be selected to be combined from stored all mark tasks, form mark set of tasks.All mark tasks in mark set of tasks can be added selective examination queue.From mark set of tasks, select mark task subclass for selective examination subsequently.The selection mode of mark task subclass can be determined as required, and the present invention does not limit this.Such as, can from mark set of tasks a certain proportion of mark task of random sampling as mark task subclass.This ratio can preset, such as, be set as 10% ~ 50% etc., because selective examination needs the time cost expending selective examination personnel, therefore can determine the ratio of sampling according to actual needs.Subsequently, selective examination personnel are sent to carry out manual examination and verification the average annotation results relevant to each mark task in mark task subclass.By selective examination, can determine that whether the average annotation results relevant to mark task be qualified further, thus improve mark accuracy further.
Alternatively, the mark task marked in set of tasks is that marking types is identical and the mark task that label time is in preset period of time.The identical mark referred in mark task of marking types requires it is identical, and just mark unit is different.Such as, for face mark, if the facial image in different labeled task is different, but mark requires identical, such as, all require 20 key points marked on face, then can think that these mark tasks belong to the mark task of same marking types.Also such as, if mark unit is the different image comprising personage, mark requires the range of age being the personage marked out in image, then such mark task also belongs to the mark task of same marking types.Label time refers to the time receiving annotation results, namely marks the time that personnel provide its annotation results.Because reality mark situation may along with time variations, it may be unstable, and therefore label time relatively, should can compare and have reference value.
Alternatively, determine whether mark set of tasks is comprised by selective examination: obtain selective examination percent of pass based on selective examination feedback information; And by selective examination percent of pass compared with percent of pass threshold value, if selective examination percent of pass is more than or equal to percent of pass threshold value, then determine that mark set of tasks is by selective examination, if selective examination percent of pass is less than percent of pass threshold value, then determine that mark set of tasks is not by selective examination.After determining whether mark set of tasks passes through selective examination, labeled data disposal route may further include: if mark set of tasks is by selective examination, then determine the final annotation results relevant to each mark task in mark set of tasks based on selective examination feedback information; And if mark set of tasks is not by selective examination, then increase similarity threshold and go to step S120.
Selective examination feedback information is the information that selective examination personnel provide, and it can comprise the information of the value of directly instruction selective examination percent of pass, such as, point out that the selective examination percent of pass that certain is spot-check is how many.Selective examination feedback information can also comprise the information pointing out whether each average annotation results accepting selective examination exists mistake and/or mistake and how to correct.Afterwards, selective examination percent of pass can be calculated according to the correct and error situation of all average annotation results accepting selective examination.Percent of pass threshold value can be any suitable value, and such as percent of pass threshold value can be more than or equal to 90% and be less than or equal to 99%.If selective examination percent of pass is less than percent of pass threshold value, whole mark set of tasks is then described not by selective examination, all mark tasks in this mark set of tasks can be removed from selective examination queue and rejoin quality testing queue, and increase the similarity threshold of these mark tasks, make the requirement of the mark accuracy to these mark tasks become higher.If selective examination percent of pass is more than or equal to percent of pass threshold value, then illustrate that whole mark set of tasks is by selective examination, can think that all mark tasks in this mark set of tasks complete.Subsequently, final annotation results can be determined based on selective examination feedback information.Such as, if do not find mistake in sampling procedure, then can directly using to the relevant average annotation results of each mark task in mark set of tasks as final annotation results, if find mistake in sampling procedure, then can obtain final annotation results after filtering out the mistake mark found in selective examination.
Alternatively, before step S110, labeled data disposal route may further include: receive the identification information relevant to mark personnel; From queue to be marked, select mark task based on identification information, mark task is corresponding with mark personnel; And send mark task, to provide the annotation results relevant to mark task by mark personnel; And the annotation results that reception mark personnel provide is as one of multiple annotation results.As described above, can safeguard queue to be marked, it comprises some mark tasks.Labeled data disposal system can pass through interactive device, and such as user interactions interface, sends to mark personnel by mark task.Labeled data disposal system realizes with the such as application program (APP) that can utilize further alternately of mark personnel.Mark personnel can open this APP, input its identification information.Identification information can be any information that can be used in the identity identifying mark personnel, such as, mark the account name of personnel and password etc.Labeled data disposal system based on the identity of this identification information identification mark personnel, and then can send to it mark task being applicable to it.Such as, mark personnel have accepted and the mark task completed no longer will send to this mark personnel.In addition, be understandable that, mark personnel also initiatively can initiate request, and the type of the mark task participated in is wished in such as request, and labeled data disposal system can also send according to its request selecting the mark task being applicable to this mark personnel.
As described above, supervisory personnel also can be considered as being an annotation process to the process that annotation results checks, supervisory personnel also can be considered as mark personnel.Therefore, alternatively, above-mentioned mark personnel and supervisory personnel can be same lineups.That is, same people both can be mark personnel also can be supervisory personnel, and it can flexible conversion as required.Mark personnel and the unmatched problem of supervisory personnel's ratio can be prevented like this, thus mark personnel can be avoided very busy and the very idle state of supervisory personnel or mark personnel are very idle and the state that supervisory personnel is very busy.Step in each step in this implementation and labeled data disposal route mentioned above is basically identical, but when carrying out quality testing, should be noted that and carry out marking and quality testing for the people that same mark task choosing is different, namely prevent the mark personnel of same mark task and supervisory personnel from being same people.
According to a further aspect of the invention, a kind of labeled data disposal system is provided.Fig. 4 illustrates the schematic block diagram of labeled data disposal system 400 according to an embodiment of the invention.As shown in Figure 4, labeled data disposal system 400 comprises calculation element 410, similarity-rough set device 420, first actuating unit 430 and the second actuating unit 440.
Calculation element 410 is for calculating the similarity of the multiple annotation results relevant to mark task.Similarity-rough set device 420 for by similarity compared with similarity threshold, if similarity is more than or equal to similarity threshold, then start the first actuating unit 430, if similarity is less than similarity threshold, then start the second actuating unit 440.First actuating unit 430 is for determining that multiple annotation results passes through quality testing.Second actuating unit 440 is for determining that multiple annotation results does not pass through quality testing.
Any one or more in calculation element 410, similarity-rough set device 420, first actuating unit 430 and the second actuating unit 440 can adopt any suitable hardware, software and/or firmware to realize, such as, by realizations such as special IC (ASIC), field programmable gate array (FPGA), digital signal processing (DSP).Any one or more in calculation element 410, similarity-rough set device 420, first actuating unit 430 and the second actuating unit 440 can integrate or adopt independent device to realize with other devices in labeled data disposal system 400.Direct or indirect mode can be adopted between calculation element 410 and similarity-rough set device 420 to connect, such as, connected by wired or wireless mode.Also direct or indirect mode can be adopted to connect between similarity-rough set device 420 and the first actuating unit 430 or the second actuating unit 440, such as, be connected by wired or wireless mode.
According to labeled data disposal system provided by the invention, due to the quality utilizing similarity automatically to detect annotation results, therefore make mark personnel likely know the quality of annotation results in time, and then likely correct marking error in time, can effectively improve mark accuracy.
Alternatively, labeled data disposal system 400 may further include acquisition device, judgment means and annotation results determining device (not shown).Acquisition device is for obtaining the annotation results of the given number relevant to mark task.Judgment means, for judging the quantity threshold whether given number equals relevant to mark task, if given number equals quantity threshold, then starts annotation results determining device, otherwise starts acquisition device.Annotation results determining device is used for the annotation results of given number to be defined as multiple annotation results and start-up simulation device.With above-mentioned calculation element 410, similarity-rough set device 420, first actuating unit 430 and the second actuating unit 440 similarly, any one or more in acquisition device, judgment means and annotation results determining device can adopt any suitable hardware, software and/or firmware to realize.
Alternatively, the second actuating unit 440 can be further used for increasing quantity threshold and starting acquisition device.When determining multiple annotation results not by quality testing, can quantity threshold be increased, and obtain more annotation results and be used for carrying out quality testing.Be understandable that, due to as described above, by supervisory personnel hand inspection carried out to annotation results and providing in the process checking feedback information, very not high to the requirement of the inspection precision of supervisory personnel, and it is higher to the requirement of the mark accuracy of mark personnel, therefore can omitting hand inspection, by increasing the mode of quantity threshold, directly utilizing quality testing to ensure mark quality.Can require to increase quantity threshold for the larger mark task of multiple annotation results difference, namely increase labeled times.For multiple annotation results obviously consistent or mark task that most of annotation results is comparatively close, can directly be averaged to obtain average annotation results to multiple annotation results, as described above.Like this, can try one's best and utilize mark manpower efficiently, can rational many people repeat mark be passed through simultaneously, reach the object promoting mark quality.
Alternatively, the second actuating unit 440 may further include annotation results sending module, checks feedback receive module and checks feedback sending module.Annotation results sending module for sending multiple annotation results, to be checked by supervisory personnel.Check that feedback receive module is used for checking feedback information.Check that feedback sending module is for sending inspection feedback information, to inform mark personnel.Less in the similarity of multiple annotation results, when making annotation results not by quality testing, can supervisory personnel be sent to check multiple annotation results.Supervisory personnel, after checking annotation results, can provide inspection feedback information.Carry out hand inspection by supervisory personnel to combine with the mode utilizing similarity to detect the quality of annotation results, to improve the accuracy of annotation results further.
Alternatively, the first actuating unit 430 can be further used for being averaged to multiple annotation results, to obtain the average annotation results relevant to mark task.Average annotation results may be used for selective examination.When multiple annotation results is by quality testing, average annotation results can be determined, and average annotation results is stored for being spot-check by selective examination personnel afterwards.Illustrate above and described the determination mode of average annotation results, do not repeated them here.
Alternatively, labeled data disposal system 400 may further include subclass selecting arrangement, the first dispensing device, first receiving device and selective examination by determining device (not shown).Subclass selecting arrangement is used for selecting mark task subclass from mark set of tasks.First dispensing device for the relevant average annotation results of each mark task sent to mark in task subclass, to be spot-check by selective examination personnel.First receiving device, for receiving selective examination feedback information.Whether selective examination, by determining device, marks set of tasks by selective examination for determining based on selective examination feedback information.Be appreciated that labeled data disposal system 400 can comprise memory storage (not shown), for storing the mark task by quality testing.Can a collection of mark task groups be selected to be combined from stored all mark tasks, form mark set of tasks.Can select from mark set of tasks subsequently, such as random sampling part mark task is used for selective examination as mark task subclass.By selective examination, can determine that whether the average annotation results relevant to mark task be qualified further, thus improve mark accuracy further.
Alternatively, the mark task marked in set of tasks is that marking types is identical and the mark task that label time is in preset period of time.Marking types is identical and can compare the value and significance with mutual reference between the mark task that label time is more close.
Alternatively, spot-check and can comprise percent of pass by determining device and obtain module and percent of pass comparison module.Percent of pass obtains module and is used for obtaining selective examination percent of pass based on selective examination feedback information.Percent of pass comparison module is used for selective examination percent of pass compared with percent of pass threshold value, if selective examination percent of pass is more than or equal to percent of pass threshold value, then determine that mark set of tasks is by selective examination, if selective examination percent of pass is less than percent of pass threshold value, then determine that mark set of tasks is not by selective examination.Labeled data disposal system 400 may further include final annotation results determining device and similarity aggrandizement apparatus (not shown).Final annotation results determining device, if for mark set of tasks by selective examination, then determines the final annotation results relevant to each mark task in mark set of tasks based on selective examination feedback information.Similarity aggrandizement apparatus, if for mark set of tasks not by selective examination, then increase similarity threshold and start similarity-rough set device.
Selective examination percent of pass directly can be provided by selective examination personnel or determine the correct of annotation results and error evaluation according to selective examination personnel, and the present invention does not limit this.Can determine that whether mark set of tasks is by selective examination by selective examination percent of pass, namely whether qualified to the relevant average annotation results of each mark task in mark set of tasks or says and meet the demands, and and then can select to determine final annotation results or selection increase similarity threshold and re-start quality testing.
Alternatively, labeled data disposal system 400 may further include the second receiving trap, mark task choosing device, the second dispensing device and the 3rd receiving trap (not shown).Second receiving trap is for receiving the identification information relevant to mark personnel.Mark task choosing device is used for from queue to be marked, selecting mark task based on identification information, and mark task is corresponding with mark personnel.Second dispensing device for sending mark task, to provide the annotation results relevant to mark task by mark personnel.3rd receiving trap for the annotation results that receives mark personnel and provide as one of multiple annotation results.As described above, labeled data disposal system 400 can be selected the mark task of applicable this mark personnel according to the identification information of mark personnel and mark task be sent to this mark personnel, no longer repeats it at this.
Composition graphs 1 to Fig. 3 describes the embodiment and advantage etc. thereof of each step of labeled data disposal route provided by the invention above, those of ordinary skill in the art are by reading above about the detailed description of labeled data disposal route, the structure of above-mentioned labeled data disposal system 400, the method for operation and advantage can be understood, therefore repeat no more here.
Intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment at this method and apparatus provided.Various general-purpose system also can with use based on together with this teaching.According to description above, the structure constructed required by this type systematic is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that, except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or device or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to some modules in the labeled data disposal system of the embodiment of the present invention.The present invention can also be embodied as part or all the device program (such as, computer program and computer program) for performing method as described herein.Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

Claims (18)

1. a labeled data disposal route, comprising:
Step S110: the similarity calculating the multiple annotation results relevant to mark task;
Step S120: by described similarity compared with similarity threshold, if described similarity is more than or equal to described similarity threshold, then goes to step S130, if described similarity is less than described similarity threshold, then goes to step S140;
Step S130: determine that described multiple annotation results passes through quality testing; And
Step S140: determine that described multiple annotation results does not pass through quality testing.
2. labeled data disposal route as claimed in claim 1, wherein, before described step S110, described labeled data disposal route comprises further:
Step S102: the annotation results obtaining the given number relevant to described mark task;
Step S104: judge if described given number equals described quantity threshold, then to go to step the quantity threshold whether described given number equals relevant to described mark task S106, otherwise go to step S102; And
Step S106: the annotation results of described given number is defined as described multiple annotation results and goes to described step S110.
3. labeled data disposal route as claimed in claim 2, wherein, described step S140 comprises further: increase described quantity threshold and go to described step S102.
4. the labeled data disposal route as described in any one of claims 1 to 3, wherein,
Described step S140 comprises further:
Send described multiple annotation results, to be checked by supervisory personnel;
Check feedback information; And
Send described inspection feedback information, to inform mark personnel.
5. labeled data disposal route as claimed in claim 1, wherein, described step S130 comprises further: be averaged to described multiple annotation results, to obtain the average annotation results relevant to described mark task;
Wherein, described average annotation results is used for selective examination.
6. labeled data disposal route as claimed in claim 5, wherein, described labeled data disposal route comprises further:
Mark task subclass is selected from mark set of tasks;
Send the average annotation results relevant to each mark task in described mark task subclass, to be spot-check by selective examination personnel;
Receive selective examination feedback information; And
Determine that whether described mark set of tasks is by selective examination based on described selective examination feedback information.
7. labeled data disposal route as claimed in claim 6, wherein,
Describedly determine whether described mark set of tasks is comprised by selective examination:
Selective examination percent of pass is obtained based on described selective examination feedback information; And
By described selective examination percent of pass compared with percent of pass threshold value, if described selective examination percent of pass is more than or equal to described percent of pass threshold value, then determine that described mark set of tasks is by selective examination, if described selective examination percent of pass is less than described percent of pass threshold value, then determine described mark set of tasks not by selective examination;
Described determine that described mark set of tasks is whether by selective examination after, described labeled data disposal route comprises further:
If described mark set of tasks by selective examination, then determines the final annotation results relevant to each mark task in described mark set of tasks based on described selective examination feedback information; And
If described mark set of tasks by selective examination, does not then increase described similarity threshold and goes to described step S120.
8. labeled data disposal route as claimed in claims 6 or 7, wherein, the mark task in described mark set of tasks is that marking types is identical and the mark task that label time is in preset period of time.
9. labeled data disposal route as claimed in claim 1, wherein, before described step S110, described labeled data disposal route comprises further:
Receive the identification information relevant to mark personnel;
From queue to be marked, select described mark task based on described identification information, described mark task is corresponding with described mark personnel;
Send described mark task, to provide the annotation results relevant to described mark task by described mark personnel; And
Receive annotation results that described mark personnel provide as one of described multiple annotation results.
10. a labeled data disposal system, comprises calculation element, similarity-rough set device, the first actuating unit and the second actuating unit,
Described calculation element is for calculating the similarity of the multiple annotation results relevant to mark task;
Described similarity-rough set device is used for described similarity compared with similarity threshold, if described similarity is more than or equal to described similarity threshold, then start the first actuating unit, if described similarity is less than described similarity threshold, then start the second actuating unit;
Described first actuating unit is used for determining that described multiple annotation results passes through quality testing; And
Described second actuating unit is used for determining that described multiple annotation results does not pass through quality testing.
11. labeled data disposal systems as claimed in claim 10, wherein, described labeled data disposal system comprises acquisition device, judgment means and annotation results determining device further:
Described acquisition device is for obtaining the annotation results of the given number relevant to described mark task;
Described judgment means, for judging the quantity threshold whether described given number equals relevant to described mark task, if described given number equals described quantity threshold, then starts described annotation results determining device, otherwise starts described acquisition device; And
Described annotation results determining device is used for the annotation results of described given number being defined as described multiple annotation results and starting described calculation element.
12. labeled data disposal systems as claimed in claim 11, wherein, described second actuating unit is further used for increasing described quantity threshold and starts described acquisition device.
13. labeled data disposal systems as described in any one of claim 10 to 12, wherein, described second actuating unit comprises further:
Annotation results sending module, for sending described multiple annotation results, to be checked by supervisory personnel;
Check feedback receive module, for checking feedback information; And
Check feedback sending module, for sending described inspection feedback information, to inform mark personnel.
14. labeled data disposal systems as claimed in claim 10, wherein, described first actuating unit is further used for being averaged to described multiple annotation results, to obtain the average annotation results relevant to described mark task;
Wherein, described average annotation results is used for selective examination.
15. labeled data disposal systems as claimed in claim 14, wherein, described labeled data disposal system comprises further:
Subclass selecting arrangement, for selecting mark task subclass from mark set of tasks;
First dispensing device, for sending the average annotation results relevant to each mark task in described mark task subclass, to be spot-check by selective examination personnel;
First receiving device, for receiving selective examination feedback information; And
Spot-check by determining device, for determining that based on described selective examination feedback information whether described mark set of tasks is by selective examination.
16. labeled data disposal systems as claimed in claim 15, wherein,
Described selective examination is comprised by determining device:
Percent of pass obtains module, for obtaining selective examination percent of pass based on described selective examination feedback information; And
Percent of pass comparison module, for by described selective examination percent of pass compared with percent of pass threshold value, if described selective examination percent of pass is more than or equal to described percent of pass threshold value, then determine that described mark set of tasks is by selective examination, if described selective examination percent of pass is less than described percent of pass threshold value, then determine described mark set of tasks not by selective examination;
Described labeled data disposal system comprises further:
Final annotation results determining device, if for described mark set of tasks by selective examination, then determine the final annotation results relevant to each mark task in described mark set of tasks based on described selective examination feedback information; And
Similarity aggrandizement apparatus, if for described mark set of tasks not by selective examination, then increase described similarity threshold and start described similarity-rough set device.
17. labeled data disposal systems as described in claim 15 or 16, wherein, the mark task in described mark set of tasks is that marking types is identical and the mark task that label time is in preset period of time.
18. labeled data disposal systems as claimed in claim 10, wherein, described labeled data disposal system comprises further:
Second receiving trap, for receiving the identification information relevant to mark personnel;
Mark task choosing device, for selecting described mark task based on described identification information from queue to be marked, described mark task is corresponding with described mark personnel;
Second dispensing device, for sending described mark task, to provide the annotation results relevant to described mark task by described mark personnel; And
3rd receiving trap, for receiving annotation results that described mark personnel provide as one of described multiple annotation results.
CN201510744484.5A 2015-11-03 2015-11-03 Labeled data processing method and labeled data processing system Active CN105404896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510744484.5A CN105404896B (en) 2015-11-03 2015-11-03 Labeled data processing method and labeled data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510744484.5A CN105404896B (en) 2015-11-03 2015-11-03 Labeled data processing method and labeled data processing system

Publications (2)

Publication Number Publication Date
CN105404896A true CN105404896A (en) 2016-03-16
CN105404896B CN105404896B (en) 2019-04-19

Family

ID=55470371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510744484.5A Active CN105404896B (en) 2015-11-03 2015-11-03 Labeled data processing method and labeled data processing system

Country Status (1)

Country Link
CN (1) CN105404896B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
CN106489149A (en) * 2016-06-29 2017-03-08 深圳狗尾草智能科技有限公司 A kind of data mask method based on data mining and mass-rent and system
CN108154197A (en) * 2018-01-22 2018-06-12 腾讯科技(深圳)有限公司 Realize the method and device that image labeling is verified in virtual scene
CN108197658A (en) * 2018-01-11 2018-06-22 阿里巴巴集团控股有限公司 Image labeling information processing method, device, server and system
CN108229772A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 Mark processing method and processing device
CN108364018A (en) * 2018-01-25 2018-08-03 北京墨丘科技有限公司 A kind of guard method of labeled data, terminal device and system
CN108446695A (en) * 2018-02-06 2018-08-24 阿里巴巴集团控股有限公司 Method, apparatus and electronic equipment for data mark
CN108537240A (en) * 2017-03-01 2018-09-14 华东师范大学 Commodity image semanteme marking method based on domain body
CN108960297A (en) * 2018-06-15 2018-12-07 北京金山云网络技术有限公司 Mask method, annotation equipment, equipment and the storage medium of picture
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN109214343A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating face critical point detection model
CN109784381A (en) * 2018-12-27 2019-05-21 广州华多网络科技有限公司 Markup information processing method, device and electronic equipment
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN110210526A (en) * 2019-05-14 2019-09-06 广州虎牙信息科技有限公司 Predict method, apparatus, equipment and the storage medium of the key point of measurand
CN110263934A (en) * 2019-05-31 2019-09-20 中国信息通信研究院 A kind of artificial intelligence data mask method and device
CN110400029A (en) * 2018-04-24 2019-11-01 北京京东尚科信息技术有限公司 A kind of method and system of mark management
CN110750523A (en) * 2019-09-12 2020-02-04 苏宁云计算有限公司 Data annotation method, system, computer equipment and storage medium
CN110796185A (en) * 2019-10-17 2020-02-14 北京爱数智慧科技有限公司 Method and device for detecting image annotation result
CN110880021A (en) * 2019-11-06 2020-03-13 创新奇智(北京)科技有限公司 Model-assisted data annotation system and annotation method
CN111932536A (en) * 2020-09-29 2020-11-13 平安国际智慧城市科技股份有限公司 Method and device for verifying lesion marking, computer equipment and storage medium
CN111966674A (en) * 2020-08-25 2020-11-20 北京金山云网络技术有限公司 Method and device for judging qualification of labeled data and electronic equipment
CN111986194A (en) * 2020-09-03 2020-11-24 平安国际智慧城市科技股份有限公司 Medical annotation image detection method and device, electronic equipment and storage medium
CN112084241A (en) * 2020-09-23 2020-12-15 北京金山云网络技术有限公司 Method and device for screening labeled data and electronic equipment
CN112347990A (en) * 2020-11-30 2021-02-09 重庆空间视创科技有限公司 Multimode-based intelligent manuscript examining system and method
CN112528609A (en) * 2019-08-29 2021-03-19 北京声智科技有限公司 Method, system, equipment and medium for quality inspection of labeled data
WO2023126280A1 (en) 2021-12-30 2023-07-06 Robert Bosch Gmbh A system and method for quality check of labelled images

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867025A (en) * 2012-08-23 2013-01-09 百度在线网络技术(北京)有限公司 Method and device for acquiring picture marking data
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
CN103678685A (en) * 2013-12-26 2014-03-26 华为技术有限公司 Webpage labeling method and device
CN104573359A (en) * 2014-12-31 2015-04-29 浙江大学 Method for integrating crowdsource annotation data based on task difficulty and annotator ability
CN104615705A (en) * 2015-01-30 2015-05-13 百度在线网络技术(北京)有限公司 Web page quality detection method and device
CN104795077A (en) * 2015-03-17 2015-07-22 北京航空航天大学 Voice annotation quality consistency detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324620A (en) * 2012-03-20 2013-09-25 北京百度网讯科技有限公司 Method and device for rectifying marking results
CN102867025A (en) * 2012-08-23 2013-01-09 百度在线网络技术(北京)有限公司 Method and device for acquiring picture marking data
CN103678685A (en) * 2013-12-26 2014-03-26 华为技术有限公司 Webpage labeling method and device
CN104573359A (en) * 2014-12-31 2015-04-29 浙江大学 Method for integrating crowdsource annotation data based on task difficulty and annotator ability
CN104615705A (en) * 2015-01-30 2015-05-13 百度在线网络技术(北京)有限公司 Web page quality detection method and device
CN104795077A (en) * 2015-03-17 2015-07-22 北京航空航天大学 Voice annotation quality consistency detection method

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
CN105975980B (en) * 2016-04-27 2019-04-05 百度在线网络技术(北京)有限公司 The method and apparatus of monitoring image mark quality
CN106489149A (en) * 2016-06-29 2017-03-08 深圳狗尾草智能科技有限公司 A kind of data mask method based on data mining and mass-rent and system
CN108229772A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 Mark processing method and processing device
CN108537240A (en) * 2017-03-01 2018-09-14 华东师范大学 Commodity image semanteme marking method based on domain body
TWI729331B (en) * 2018-01-11 2021-06-01 開曼群島商創新先進技術有限公司 Image annotation information processing method, device, server and system
CN108197658A (en) * 2018-01-11 2018-06-22 阿里巴巴集团控股有限公司 Image labeling information processing method, device, server and system
WO2019137196A1 (en) * 2018-01-11 2019-07-18 阿里巴巴集团控股有限公司 Image annotation information processing method and device, server and system
CN108197658B (en) * 2018-01-11 2020-08-14 阿里巴巴集团控股有限公司 Image annotation information processing method, device, server and system
CN108154197B (en) * 2018-01-22 2022-03-15 腾讯科技(深圳)有限公司 Method and device for realizing image annotation verification in virtual scene
CN108154197A (en) * 2018-01-22 2018-06-12 腾讯科技(深圳)有限公司 Realize the method and device that image labeling is verified in virtual scene
CN108364018A (en) * 2018-01-25 2018-08-03 北京墨丘科技有限公司 A kind of guard method of labeled data, terminal device and system
CN108446695A (en) * 2018-02-06 2018-08-24 阿里巴巴集团控股有限公司 Method, apparatus and electronic equipment for data mark
CN108446695B (en) * 2018-02-06 2022-02-11 创新先进技术有限公司 Method and device for data annotation and electronic equipment
CN110400029A (en) * 2018-04-24 2019-11-01 北京京东尚科信息技术有限公司 A kind of method and system of mark management
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN108960297A (en) * 2018-06-15 2018-12-07 北京金山云网络技术有限公司 Mask method, annotation equipment, equipment and the storage medium of picture
CN109034188B (en) * 2018-06-15 2021-11-05 北京金山云网络技术有限公司 Method and device for acquiring machine learning model, equipment and storage medium
CN109214343A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating face critical point detection model
CN109784381A (en) * 2018-12-27 2019-05-21 广州华多网络科技有限公司 Markup information processing method, device and electronic equipment
CN110188769B (en) * 2019-05-14 2023-09-05 广州虎牙信息科技有限公司 Method, device, equipment and storage medium for auditing key point labels
CN110188769A (en) * 2019-05-14 2019-08-30 广州虎牙信息科技有限公司 Checking method, device, equipment and the storage medium of key point mark
CN110210526A (en) * 2019-05-14 2019-09-06 广州虎牙信息科技有限公司 Predict method, apparatus, equipment and the storage medium of the key point of measurand
CN110263934A (en) * 2019-05-31 2019-09-20 中国信息通信研究院 A kind of artificial intelligence data mask method and device
CN110263934B (en) * 2019-05-31 2021-08-06 中国信息通信研究院 Artificial intelligence data labeling method and device
CN112528609A (en) * 2019-08-29 2021-03-19 北京声智科技有限公司 Method, system, equipment and medium for quality inspection of labeled data
CN110750523A (en) * 2019-09-12 2020-02-04 苏宁云计算有限公司 Data annotation method, system, computer equipment and storage medium
CN110796185A (en) * 2019-10-17 2020-02-14 北京爱数智慧科技有限公司 Method and device for detecting image annotation result
CN110880021A (en) * 2019-11-06 2020-03-13 创新奇智(北京)科技有限公司 Model-assisted data annotation system and annotation method
CN110880021B (en) * 2019-11-06 2021-03-16 创新奇智(北京)科技有限公司 Model-assisted data annotation system and annotation method
CN111966674A (en) * 2020-08-25 2020-11-20 北京金山云网络技术有限公司 Method and device for judging qualification of labeled data and electronic equipment
CN111966674B (en) * 2020-08-25 2024-03-15 北京金山云网络技术有限公司 Method and device for judging eligibility of annotation data and electronic equipment
CN111986194A (en) * 2020-09-03 2020-11-24 平安国际智慧城市科技股份有限公司 Medical annotation image detection method and device, electronic equipment and storage medium
CN112084241A (en) * 2020-09-23 2020-12-15 北京金山云网络技术有限公司 Method and device for screening labeled data and electronic equipment
CN111932536A (en) * 2020-09-29 2020-11-13 平安国际智慧城市科技股份有限公司 Method and device for verifying lesion marking, computer equipment and storage medium
WO2022068228A1 (en) * 2020-09-29 2022-04-07 平安国际智慧城市科技股份有限公司 Lesion mark verification method and apparatus, and computer device and storage medium
CN111932536B (en) * 2020-09-29 2021-03-05 平安国际智慧城市科技股份有限公司 Method and device for verifying lesion marking, computer equipment and storage medium
CN112347990A (en) * 2020-11-30 2021-02-09 重庆空间视创科技有限公司 Multimode-based intelligent manuscript examining system and method
CN112347990B (en) * 2020-11-30 2024-02-02 重庆空间视创科技有限公司 Multi-mode-based intelligent manuscript examining system and method
WO2023126280A1 (en) 2021-12-30 2023-07-06 Robert Bosch Gmbh A system and method for quality check of labelled images

Also Published As

Publication number Publication date
CN105404896B (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN105404896A (en) Annotation data processing method and annotation data processing system
CN110275958B (en) Website information identification method and device and electronic equipment
CN110245716B (en) Sample labeling auditing method and device
CN111240653B (en) Interface document generation method, device and readable storage medium
CN105468507B (en) Branch standard reaching detection method and device
CN110619497B (en) Address checking method, device, electronic equipment and storage medium
CN111510468B (en) Scheduling method and device of computing task, server and computing system
US11386499B2 (en) Car damage picture angle correction method, electronic device, and readable storage medium
CN106485436B (en) Express receiving verification method and device
CN107632909B (en) Method and system for automatically testing device functions
CN110969387A (en) Order distribution method, server, terminal and system
CN112989768B (en) Method and device for correcting connection questions, electronic equipment and storage medium
CN108898196B (en) Logistics patrol monitoring method and device and patrol terminal
CN112488257B (en) Method and equipment for preventing throwing errors in manual feeding, storage medium and feeding system
CN110600090B (en) Clinical examination data processing method, device, medium and terminal equipment
CN112181485A (en) Script execution method and device, electronic equipment and storage medium
CN113572826B (en) Device information binding method and system and electronic device
CN115758389A (en) Vulnerability processing result checking method and device, electronic equipment and storage medium
CN111966394B (en) ETL-based data analysis method, device, equipment and storage medium
CN111400245B (en) Art resource migration method and device
US20090285447A1 (en) Correcting video coding errors using an automatic recognition result
CN106685966B (en) Method, device and system for detecting leakage information
CN117421907B (en) Household garbage incineration flue gas purification system
CN111046420B (en) Method and device for acquiring information of energy equipment
CN112529039B (en) Method and device for checking material information of main board and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313

Applicant after: MEGVII INC.

Applicant after: Beijing maigewei Technology Co., Ltd.

Address before: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313

Applicant before: MEGVII INC.

Applicant before: Beijing aperture Science and Technology Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant