CN108446695A - Method, apparatus and electronic equipment for data mark - Google Patents

Method, apparatus and electronic equipment for data mark Download PDF

Info

Publication number
CN108446695A
CN108446695A CN201810115780.2A CN201810115780A CN108446695A CN 108446695 A CN108446695 A CN 108446695A CN 201810115780 A CN201810115780 A CN 201810115780A CN 108446695 A CN108446695 A CN 108446695A
Authority
CN
China
Prior art keywords
result
character
data
recognition
marked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810115780.2A
Other languages
Chinese (zh)
Other versions
CN108446695B (en
Inventor
张兰渝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202210197077.7A priority Critical patent/CN114677681A/en
Priority to CN201810115780.2A priority patent/CN108446695B/en
Publication of CN108446695A publication Critical patent/CN108446695A/en
Application granted granted Critical
Publication of CN108446695B publication Critical patent/CN108446695B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the present application disclose it is a kind of for data mark method, apparatus and electronic equipment, this method include:Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, data to be marked include at least one character;In determining the multiple recognition result there are candidate result when different recognition results, determined in the multiple recognition result and interference result;According to the candidate result and the interference as a result, judge the identification situation of the data to be marked, the identification situation is including identifying successfully and recognition failures.

Description

Method, apparatus and electronic equipment for data mark
Technical field
This application involves field of computer data processing, more particularly relate to the method, apparatus and electricity of data mark Sub- equipment.
Background technology
Data are usually showed use by current most of reptile channels in order to be protected to data in a manner of picture Family improves the threshold of data acquisition.The data (image data) being shown with picture are parsed to be promoted and are climbed Worm channel competitiveness in the market.It (such as optical character identification scene, is tested in the scene that image data is identified at present Card code identification scene, handwritten text identify scene) in be required for using artificial mark.Using manually knowing to image data Not, the mark collection that the recognition result obtained is formed has very important purposes.
But the existing process manually marked needs to spend a large amount of manpower and materials, and manual errors can not avoid, and cause to know The effective rate of utilization of other result (annotation results in other words) can not ensure.
Therefore, a kind of method for data mark of demand, to overcome above-mentioned technical problem.
Invention content
The application's is designed to provide a kind of method, apparatus and electronic equipment for data mark, can ensure to know The effective rate of utilization of other result.
In order to solve the above technical problems, what the embodiment of the present application was realized in:
In a first aspect, a kind of method for data mark is provided, including:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one A character;
It, there are when different recognition results, is being determined in the multiple recognition result in determining the multiple recognition result Candidate result and interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification Situation includes identifying successfully and recognition failures.
Second aspect provides a kind of device for data mark, including:
Acquiring unit obtains multiple recognition results after treating the more wheel identifications of labeled data progress, the data to be marked Including at least one character;
Determination unit, there are when different recognition results, is determining the multiple knowledge in determining the multiple recognition result Candidate result in other result and interference result;
Judging unit, according to the candidate result and the interference as a result, judging the identification situation of the data to be marked, The identification situation includes identifying successfully and recognition failures.
The third aspect provides a kind of electronic equipment, including:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction uses described when executed Processor executes following operation:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one A character;
It, there are when different recognition results, is being determined in the multiple recognition result in determining the multiple recognition result Candidate result and interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification Situation includes identifying successfully and recognition failures.
Fourth aspect, provides a kind of computer-readable medium, the computer-readable medium storage one or more program, One or more of programs by the electronic equipment including multiple application programs when being executed so that the electronic equipment execute with Lower operation:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one A character;
It, there are when different recognition results, is being determined in the multiple recognition result in determining the multiple recognition result Candidate result and interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification Situation includes identifying successfully and recognition failures.
By the above technical solution provided by the embodiments of the present application as it can be seen that the embodiment of the present application data to be marked multiple knowledges It, can be according to candidate result and the interference in multiple recognition results as a result, judging there are when different recognition results in other result The identification situation of data to be marked avoids only to judge number to be marked according to recognition result in multiple recognition result all sames According to identification situation caused by the effective rate of utilization of recognition result can not ensure the problem of, ensure efficiently using for recognition result Rate.
Description of the drawings
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments described in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, other drawings may also be obtained based on these drawings.
Fig. 1 is the schematic flow chart according to the method for data mark of one embodiment of the application.
Fig. 2 is the schematic flow chart according to the method for data mark of the specific embodiment of the application.
Fig. 3 is the structural schematic diagram according to the electronic equipment of the application one embodiment.
Fig. 4 is the structural schematic diagram according to the device for data mark of the application one embodiment.
Specific implementation mode
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The every other embodiment that technical staff is obtained without creative efforts should all belong to the application protection Range.
Fig. 1 is the flow chart according to the method for data mark of one embodiment of the application.The method of Fig. 1 by with It is executed in the device of data mark.It should be understood that the scheme of the embodiment of the present application is applicable to the mark of word class image data.When It is so also not excluded for for the method for the embodiment of the present application being applied to the mark of other data.
As illustrated in FIG. 1, at S102, multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, The data to be marked include at least one character.
It is identified it is understood that in each round identification process labeled data can be treated by multiple people, for example, often One wheel can treat labeled data by 3 people and be identified.And the participant in arbitrary two-wheeled identification process can be different, example Such as the participation of first round identification process artificial A, B and C, then participation artificial D, E and F of the second wheel identification process.Everyone treats Labeled data is identified to obtain a recognition result.
Optionally, in some embodiments, it is required to determine the total wheel treated labeled data and be identified according to aging performance Participation number in number and/or often wheel identification process.Such as require the higher total wheel number being identified fewer according to aging performance And/or often take turns the fewer rule of participations number in identification process, determination treat total wheel number that labeled data is identified and/or Often take turns the participation number in identification process.
At S104, there are when different recognition results, the multiple knowledge is being determined in determining the multiple recognition result Candidate result in other result and interference result.
Optionally, in some embodiments, if the multiple recognition result all same, identification knot can directly be calculated The accuracy rate of fruit requires the identification situation for judging data to be marked according to the accuracy rate of recognition result and accuracy rate.
Specifically, in some embodiments, according to occurrence number determine the candidate result and it is described interference as a result, its In, the candidate result is the recognition result that occurrence number is most in the multiple recognition result, and the interference result is described Recognition result in multiple recognition results in addition to the interference result.Here occurrence number it can be appreciated that ballot number or Person treats labeled data and is identified to obtain the number of the same recognition result.
For example, data to be marked include 0-3 totally 4 characters, are treated after labeled data is identified and are obtained by 6 people Recognition result be respectively 1234,1324,1234,1342,1234 and 123, herein 1234 occurrence numbers be 4 times, 1324 go out Occurrence number is 1 time, and 123 occurrence numbers are 1 time, then are determined as candidate result by 1234, and 1324 and 123 are determined as interference knot Fruit.
Further, in some embodiments, the character that the quantity for the character that interference result includes includes with candidate result Quantity it is identical.Or it is to be understood that in the candidate result and interference result in determining multiple recognition results, occurrence will be gone out Number it is most be determined as candidate result, the character that the quantity by the character that multiple recognition results include includes with candidate result Other identical recognition results of quantity are determined as interference as a result, and if there is the quantity for the character for including in multiple recognition results The recognition result different from the quantity for the character that candidate result includes can then ignore the quantity and time of the character that this part includes The recognition result for selecting the quantity for the character that result includes different, or it is interpreted as the quantity that the character that this part includes is abandoned in dislocation The recognition result different from the quantity for the character that candidate result includes.
For example, data to be marked include 0-3 totally 4 characters, are treated after labeled data is identified and are obtained by 6 people Recognition result be respectively 1234,1324,1234,1342,1234 and 123, herein 1234 occurrence numbers be 4 times, 1324 go out Occurrence number is 1 time, and 123 occurrence numbers are 1 time, then are determined as candidate result by 1234, is determined as interference by 1324 as a result, dislocation Abandon 123.
At S106, according to the candidate result and the interference as a result, judging the identification situation of the data to be marked, The identification situation includes identifying successfully and recognition failures.
Specifically, in some embodiments, according to Distribution Algorithm model, the occurrence number of the candidate result and described dry The occurrence number for disturbing result, judges the type of the candidate result, and the type of the candidate result includes determining effective result, really Determine null result and to be determined effectively as a result, the Distribution Algorithm model is required based on accuracy rate and the knowledge of training data to be marked Other result trains to obtain;According to the type of the candidate result, the identification situation of the data to be marked is judged.
Optionally, as an example, during training Distribution Algorithm model, determine that accuracy rate requires, and according to Aging performance requires the not identical participation number for determining the total wheel number of identification and often taking turns, and statistics accumulation participates in total number of persons later, point The distribution situation of all recognition results being likely to occur is analysed, determine each distribution situation estimates accuracy rate, is then based on each distribution Situation estimates accuracy rate and the determining Distribution Algorithm model of accuracy rate requirement.
As an example it is assumed that including 0-9 totally ten characters, word for training the training data to be marked of Distribution Algorithm model The recognition accuracy of symbol is a (recognition accuracy of character is construed as the lower limit of character recognition accuracy rate here), 0-9 totally ten The probability initial stage that a character is misdeemed is equal as (1-a)/9, a=0.9, then it is accurate can to extrapolate estimating for each distribution situation Rate.
For example, when there is 2 people to be identified, the distribution situation when recognition result of this 2 people is identical is (2), and (2) It is 99.86301369863013% to estimate accuracy rate.When there is 3 people to be identified, point when recognition result of this 3 people is identical Cloth situation is (3), and the accuracy rate of estimating of (3) is 99.99830651989839%.When there is 4 people to be identified, this 4 people Recognition result it is identical when distribution situation be (4), and (4) estimate accuracy rate be 99.99997909248857%.When having 5 Individual is identified, and distribution situation when this 5 people's recognition results are identical is (5), and the accuracy rate of estimating of (5) is 99.99999974188252%.When there is 6 people to be identified, distribution situation when this 6 people's recognition results are identical is (6), And the accuracy rate of estimating of (6) is 99.99999999681336%.When there are 3 people to be identified, there is the knowledge of 2 people in this 3 people Other result is identical, and the asynchronous distribution situation of recognition result of the recognition result of another person and this 2 people is (2,1), and (2, 1) accuracy rate of estimating is 98.66165413533837%.When there are 4 people to be identified, there is the identification of 3 people in this 4 people As a result the asynchronous distribution situation of recognition result of identical and another person recognition result and this 3 people are (3,1), and (3, 1) accuracy rate of estimating is 99.98325588395764%.When there are 5 people to be identified, there is the identification of 3 people in this 5 people As a result the recognition result of identical and another 2 people is identical and is (3,2) with the asynchronous distribution situation of the recognition result of this 3 people, And the accuracy rate of estimating of (3,2) is 98.77901897734243%.When there are 5 people to be identified, there is 4 people's in this 5 people Recognition result is identical and the recognition result of another 1 people and the asynchronous distribution situation of recognition result of this 4 people are (4,1), and (4,1) accuracy rate of estimating is 99.99979324832664%.When there are 6 people to be identified, there is the knowledge of 4 people in this 6 people Other result is identical and the recognition result of another 2 people is identical and with the asynchronous distribution situation of the recognition result of this 4 people be (4, 2), and the accuracy rate of estimating of (4,2) is 99.9847421648845%.When there are 6 people to be identified, there are 5 people in this 6 people Recognition result it is identical and the recognition result of another 1 people and the asynchronous distribution situation of recognition result of this 5 people are (5,1), And the accuracy rate of estimating of (5,1) is 99.999997447505%.When there are 6 people to be identified, there is the knowledge of 4 people in this 6 people Other result is identical and the recognition result of another 2 people is different from the recognition result of this 4 people and the recognition result of this 2 people is also different When distribution situation be (4,1,1), and (4,1,1) estimate accuracy rate be 99.98994724384438%.When have 5 people into Row identification, have in this 5 people the recognition result of 3 people identical and the recognition result of the recognition result of another 2 people and this 3 people not Also asynchronous distribution situation is (3,1,1) to same and this 2 people recognition result, and the accuracy rate of estimating of (3,1,1) is 98.96391181294628%.When there are 4 people to be identified, there is the recognition result of 2 people identical in this 4 people and another 2 people Recognition result is different from the recognition result of this 2 people and the recognition result of another 2 people also asynchronous distribution situation be (2,1, 1), and the accuracy rate of estimating of (2,1,1) is 45.67351200835363%.When there are 6 people to be identified, have 3 in this 6 people Personal recognition result is identical and the recognition result of another 3 people is different from the recognition result of this 3 people and the identification knot of another 3 people Also asynchronous distribution situation is (3,1,1,1) to fruit, and the accuracy rate of estimating of (3,1,1,1) is 81.24386157490324%. When there are 6 people to be identified, there is the recognition result of 3 people identical in this 6 people and the recognition result of another 3 people and this 3 people During recognition result is different and the recognition result of another 3 people 2 people recognition result it is identical when distribution situation be (3,2,1), And the accuracy rate of estimating of (3,2,1) is 79.73229373797001%.
Assuming that accuracy rate require be accuracy rate be higher than 99.99%, then in conjunction with above-mentioned each distribution situation estimate accuracy rate and Accuracy rate requirement, it can be deduced that Distribution Algorithm model is:(a)x<When=y, the type of candidate result is to determine null result;(b) When x=2, the type of candidate result is to determine null result;(c)x-y>When 2, the type of candidate result is to determine effective result; (d)x-y<When=2, the type of candidate result is effective result to be determined.X indicates the occurrence number of candidate result, y tables herein Show the occurrence number of interference result.
Optionally, as one embodiment, when the type for judging the candidate result is determines effective result, determination waits for Labeled data identifies successfully, and the candidate result is determined as to the annotation results of the data to be marked.
Further, judge the candidate result for determine effective result when, according to candidate result occur number and The number for interfering result to occur updates number that at least one character that data to be marked include is correctly validated and by mistake The number of identification.
As an example it is assumed that 6 people treat labeled data and are identified, the number that candidate result occurs is 5, interferes result The number of appearance is 1, and interferes result that the character 6 in data to be marked is identified as character 8, then includes by data to be marked The number that is correctly validated of at least one character increase by 5, the character 6 at least one character is identified as to time of character 8 Number increases by 1.It is possible thereby to be identified as other words convenient for counting each character at least one character that data to be marked include The probability (identification probability of the intercharacter of at least one character in other words) of symbol more accurately determines recognition result point convenient for follow-up The corresponding accuracy rate of cloth situation promotes the effective rate of utilization of recognition result, reduces cost and abundant sample set.
Optionally, judge when the type for judging the candidate result is determines null result as another embodiment It has been directed to the data to be marked and has executed the discussion of identification operation and the default relationship identified between discussing;Judging to be directed to institute When stating data to be marked and executing the discussion of identification operation and be less than default identification and take turns number, knowledge is executed again for the data to be marked It does not operate;Identify operating as a result, updating multiple recognition results, and according to more according to being executed again for the data to be marked The candidate result in multiple recognition results after new judges the identification situation of data to be marked with interference result.Or it is appreciated that When judging that being directed to the data to be marked executes the discussion for identifying operation less than default identification wheel number, to repeat Fig. 1 Shown in method the step of.Alternatively, judging the discussion for being directed to the data execution identification operation to be marked equal to default knowledge When another matter is stated, data recognition failures to be marked are determined.
Optionally, sentence when the type for judging the candidate result is effective result to be determined as another embodiment Whether the candidate result of breaking meets accuracy rate requirement;When judging that candidate result meets accuracy rate requirement, number to be marked is determined According to identifying successfully, and candidate result is determined as to the annotation results of data to be marked.
Specifically, judge whether candidate result meets accuracy rate and require to include:Determine the knowledge of at least one character Other accuracy rate;According to the recognition accuracy of at least one character and accuracy rate requirement, judge that the candidate result is It is no to meet accuracy rate requirement.That is, judging whether candidate result meets accuracy rate and require to be actually by data to be marked Multiple single characters (candidate result is actually the recognition result of this multiple single character) are divided into, character confirms one by one Whether accuracy rate requirement is met.It is understood that when the recognition accuracy of each character is satisfied by accuracy rate requirement, it is described Candidate result meets accuracy rate requirement, and otherwise the candidate result is unsatisfactory for accuracy rate requirement.
In some embodiments, determine that the recognition accuracy of at least one character includes:According to the Distribution Algorithm model, The occurrence number of the occurrence number of the candidate result and the interference result, judges the type of at least one character, institute The type for stating at least one character includes determining significant character, determining idle character and significant character to be determined;Described in judgement When at least one character is significant character to be determined, determine that the identification of at least one character is accurate according to the candidate result Rate.
Or it is to be understood that when the type of candidate result is effective result to be determined, it is thus necessary to determine that data to be marked In each character recognition accuracy whether meet accuracy rate requirement.In such a case, it is possible to true according to following mode The recognition accuracy of fixed each character:Current character is counted according to the occurrence number of the occurrence number of candidate result and interference result Ballot distribution situation, the type of current character is judged according to Distribution Algorithm model.If the type of current character, which is determination, to be had Character is imitated, then the ballot that character late is counted according to the occurrence number of the occurrence number of candidate result and interference result is distributed feelings Condition, and subsequent operation is executed according to the ballot distribution situation of character late;If the type of current character is to determine invalidation word Symbol, it is determined that candidate result is unsatisfactory for accuracy rate requirement, needs to restart to execute method shown in FIG. 1;If current character Type be significant character to be determined, it is determined that the recognition accuracy of current character, and further according to the appearance of candidate result The ballot distribution situation of number and the occurrence number statistics character late of interference result, and according to the ballot of character late point Cloth situation executes subsequent operation.
Optionally, as an example, the recognition accuracy of at least one character is determined, including:Determine the candidate knot The corresponding condition probability formula of fruit;According to the condition probability formula and identification probability matrix, at least one character is determined Recognition accuracy, the character that the element in the identification probability matrix is used to describe at least one character known The probability of other characters that Wei be at least one character.
For example, the element in identification probability matrix W is represented by Wij, WijIndicate that character j is identified as character i's Probability, and WijIt can be determined by formula (1):
C in formula (1)ijIndicate that character j is identified as the number of character i,Indicate the identified total degrees of character j.
Below in conjunction with specific example, the condition probability formula of the candidate result according to the embodiment of the present application is described.
By taking the identification of every wheel is participated in by 3 people as an example, if treating labeled data performs wheel identification, identified in this wheel In 3 people recognition result it is identical, can be expressed as public affairs in the identical condition probability formula of the recognition result of 3 people Formula (2):
And P (X1=j, X2=j, X3=j | A=j) it can be expressed as:P(X1=j | A=j) P (X2=j | A=j) P (X3=j | A=j), then formula (2) is deformed into formula (3):
Element in identification probability matrix W is brought into formula (3), you can to determine the recognition result phase in 3 people The recognition accuracy of each character in the case of.And if it is assumed that the probability W that character j is correctly validatedjj=a, then character j The probability not being correctly validatedThus it obtains In turnAssuming that a=90%, accuracy rate requires as more than 98%, then can be true The accuracy rate of each character meets accuracy rate requirement, the recognition result of this 3 people in the case of the recognition result of 3 people is identical calmly Meet accuracy rate requirement, the recognition result of this 3 people can be determined as to the annotation results of data to be identified.
Further, it if the recognition result of above-mentioned 3 people is inconsistent, and treats labeled data and has carried out the knowledge of next round Not, if do not consider a character admitted one's mistake simultaneously for a variety of kinds of characters the case where, remove the lower recognition result of accuracy rate, It will will appear following 8 kinds of events for a character:Event 1:√×××××;Event 2:√××√××;Event 3: √××√√×;Event 4:√××√√√;Event 5:√√××××;Event 6:√√×√××;Event 7: √√×√√×;Event 8:√√×√√√.Wherein, recognition result different with × expression √, √ indicate that identification is correct, × indicate identification mistake.It analyzes 8 kinds of above-mentioned events and 4 Scenes can be obtained, wherein scene one:It is deposited in event 1 and event 8 In two kinds of recognition results, and the occurrence number of two kinds of recognition results is 5:1;Scene two:There are two kinds in event 2 and event 7 Recognition result, and the occurrence number of two kinds of recognition results is 4:2;Scene three:There are two kinds of identifications in event 4 and event 5 to tie Fruit, and the occurrence number of two kinds of recognition results is 4:2;Scene four:There are two kinds of recognition results, and two in event 3 and event 6 The occurrence number of kind recognition result is 3:3.
Specifically, the condition probability formula of the event 8 in scene one is formula (4):
If it is assumed that the probability W that character j is correctly validatedjj=a, the then probability that character j is not correctly validated are (1-a), In turnIt is further assumed that word Symbol j is identified as character i when being frequent fault, i.e. character j is in addition to being identified as he itself, only when can be identified as character i, Wij =1-a;When it is rare mistake that character j, which is identified as character i, i.e. when character j can never be identified as character i, Wij≈0;a =90%.It can then determine that the accuracy rate of the recognition result of event 8 is 99.97% in frequent fault, it is close in rare mistake It is seemingly 0.
Alternatively, the condition probability formula of the event 7 in scene two is formula (5):
And if it is assumed that the probability W that character j is correctly validatedjj=a, the then probability that character j is not correctly validated are (1- A), in turnIt is further false If character j be identified as character i be frequent fault when, i.e. character j in addition to being identified as he itself, only can be identified as character i When, Wij=1-a;When it is rare mistake that character j, which is identified as character i, i.e. when character j can never be identified as character i, Wij≈0;A=90%.It can then determine that the accuracy rate of the recognition result of event 7 is 98.662% in frequent fault, rare It is approximately 0 when mistake.
Or the condition probability formula of the event 3 in scene four is formula (6):
It is further assumed that character j be identified as character i be frequent fault when, i.e. character j in addition to being identified as he itself, only When can be identified as character i, Wij=1-a;When it is rare mistake that character j, which is identified as character i, i.e. character j never can be by When being identified as character i, Wij≈0;A=90%.It can then determine that the accuracy rate of the recognition result of event 3 is in frequent fault 49.727%, it is approximately 0 in rare mistake.
Therefore, do not consider character admitted one's mistake simultaneously for a variety of different digitals in the case of, when occur mistake be common mistake It mistakes, scene one, two and three is that the scene that the accuracy rate of recognition result is met the requirements is acceptable scene in other words, works as appearance Rare mistake scene one, two and three is unacceptable scene.And scene four is all regardless of whether appearance mistake is frequent fault Unacceptable scene.
Further if it is considered that a character admitted one's mistake simultaneously for a variety of kinds of characters the case where, such as 4 people identification is just Really and 2 people identify mistake, and when the result difference of identification mistake, the same character will appear 3 recognition results, recognition result Occurrence number is (4,1,1), and the most recognition result of occurrence number is determined as candidate result, then the corresponding item of candidate result Part new probability formula is formula (7):
In formula (7)It can be indicated by formula (8):
Latter two in formula (8) are analyzed and can be obtained using average inequality:
And then it obtains:
Work as i1,i2When being frequent fault,Then:
Assuming that when a=0.9, then i1,i2When being frequent fault, the accuracy rate of candidate result is more than 99.485%, and works as i1,i2When being rare mistake,The accuracy rate of candidate result is similar to 0.
It is understood that above-mentioned determine in the condition probability formula of description candidate result and according to condition probability formula When accuracy rate, consideration is drastic worst situation.In real process, the element in identification probability matrix W is brought into specifically Condition probability formula in, you can obtain the specific recognition accuracy of each character in data to be marked, it is every according to what is obtained The specific recognition accuracy of a character can determine whether candidate result is effective.
In the embodiment of the present application, the accurate recognition accuracy of at least one character in order to obtain, general according to condition Rate formula and identification probability matrix before the recognition accuracy for determining at least one character, determine that at least one character is identified Total degree be greater than or equal to default identification number.For example, default identification is at this time 100 times, then if at least one character quilt The total degree of identification is more than 100 times, then the element in identification probability matrix can be brought into the corresponding condition of candidate result The recognition accuracy of at least one character is obtained in new probability formula.If the recognition accuracy of at least one character meets accuracy rate It is required that, it is determined that candidate result meets accuracy rate requirement.In this case, it according to the number that candidate result occurs and will not do The number for disturbing result appearance updates number and known by mistake that at least one character that data to be marked include is correctly validated Other number.
Fig. 2 is the schematic flow chart according to the method for data mark of the specific embodiment of the application.Fig. 2 Method can be executed by the device that is marked for data.As illustrated in FIG. 2, method shown in Fig. 2 includes Distribution Algorithm mould Type training and identification situation judge two main process.Distributed model training process includes S202 to S210, and identification situation judges Process includes S212-S234;Wherein:
At S202, accuracy rate requirement is determined.
At S204, the not identical participation number that manual identified is always taken turns number and often taken turns is determined.
Specifically, it can be required to determine the not identical participant that manual identified is always taken turns number and often taken turns according to aging performance Number.
It at S206, counts and adds up the number of participant, analyze all distribution of results situations being likely to occur.
Here the distribution situation of all possible outcomes can be understood as the occurrence number in method shown in FIG. 1.
It is understood that at S206, the manual identified of different numbers is executed, counting the accumulative number of participant will not Together, and the corresponding all distribution of results situations being likely to occur of the manual identified of different numbers are different.Or it is to be understood that A variety of distribution of results situations can be obtained at S206.
At S208, determine each distribution of results situation estimates accuracy rate.
Specifically, the pre- of each distribution of results situation can be determined according to the condition probability formula of each distribution of results situation Estimate accuracy rate.
At S210, is required in conjunction with accuracy rate and each branch's situation estimates accuracy rate, determine Distribution Algorithm model.
At S212, judge whether the wheel number of the manual identified of executed meets total wheel number requirement.
At S214, if total wheel number of the manual identified of executed is discontented with Football Association's wheel number requirement, execute primary artificial Identification, statistics go through wheel recognition result, obtain multiple recognition results.
Optionally, if total wheel number of the manual identified of executed meets total wheel number requirement at S212, at S214 Determine recognition failures.
At S216, the occurrence number of each recognition result is counted, and determines candidate result and interference result.
Optionally, the most recognition result of occurrence number is determined as candidate result, by length in multiple candidate results with The consistent recognition result of candidate result is determined as interference as a result, except the candidate result and the interference are tied in multiple recognition results Other recognition results except fruit are considered as dislocation and abandon.
At S218, according to Distribution Algorithm model, judge whether candidate result is determining effective result.
At S220, if candidate result be determine effectively as a result, if to candidate result and interference result divide character to count, The identification probability matrix of accumulative intercharacter.
At S220, the element in the identification probability matrix of intercharacter is identified as other characters for describing a character Probability.Character statistics is divided to be appreciated that candidate result and interference result to be tied according to the occurrence number of candidate result and interference The occurrence number of fruit counts the number that each character is correctly validated and the number being erroneously identified, and is erroneously identified here Number refer specifically to the number that a character is identified as another character, for example, 8 are identified as 6 number or 8 It is identified as 3 number.
Further, at S220, if candidate result be determine effectively as a result, if return to determination and identify successfully, and return Recognition result.It is understood that the recognition result returned here is candidate result.
At S222, if candidate result be not to determine effectively as a result, if candidate result determined according to Distribution Algorithm model Whether it is determining null result.
At S224, if candidate result is not to determine null result, candidate result is recycled by turn by character, and judge Whether completion recycles.
At S224, if candidate result is to determine null result, S212 and its subsequent step are executed.
At S224, if completing cycle, it is determined that identify successfully, and return to recognition result.It is understood that here The recognition result of return is candidate result.
At S226, if not completing cycle, according to candidate result and interference as a result, the ballot point of statistics current character Cloth.
At S228, judge whether current character is determining significant character according to Distribution Algorithm model.
At S230, if current character is not to determine significant character, determine that current character is according to Distribution Algorithm model No is to determine idle character.
Optionally, at S230, if current character is to determine significant character, S224 and its subsequent step are executed.
At S232, if current character is not to determine idle character, according to the identification probability matrix of intercharacter and candidate As a result corresponding condition probability formula determines the recognition accuracy of current character.
Optionally, at S232, if current character is to determine idle character, S212 and its subsequent step are executed.
At S234, judge whether the recognition accuracy of current character meets accuracy rate requirement, if satisfied, then executing S224 And its subsequent step, otherwise execute S212 and its subsequent step.
It should be noted that, although after S222 is in order to express easily placed in S218 by above-described embodiment, but practical feelings Under condition, the execution sequence of the two steps can exchange, i.e., first determine whether candidate result is determining null result, if not Determine that null result then determines whether candidate result is to determine effectively as a result, executing S220 if being to determine effective result, such as Fruit not can determine whether that effective result then executes S224.Likewise, although 230 are placed in after S228, under actual conditions, the two are walked Rapid execution sequence can exchange, i.e., first determine whether current character is determining idle character, if current character is not to determine Idle character then determines whether current character is determining significant character, and S224 is executed if determining significant character when current character And its subsequent step, according to the identification probability matrix and candidate result of intercharacter if current character is not to determine significant character Corresponding condition probability formula determines the recognition accuracy of current character.
The method for data mark according to the embodiment of the present application is described in detail above in association with Fig. 1 and Fig. 2.Below will In conjunction with Fig. 3 detailed descriptions according to the electronic equipment of the embodiment of the present application.With reference to figure 3, in hardware view, electronic equipment includes processing Device, optionally, including internal bus, network interface, memory.Wherein, memory may include memory, such as deposit at random at a high speed Access to memory (Random-Access Memory, RAM), it is also possible to further include nonvolatile memory (non-volatile Memory), for example, at least 1 magnetic disk storage etc..Certainly, which is also possible that other business are required hard Part.
Processor, network interface and memory can be connected with each other by internal bus, which can be industry Standard architecture (Industry Standard Architecture, ISA) bus, Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The bus can be divided into address bus, data/address bus, Controlling bus etc..For ease of indicating, only indicated with a four-headed arrow in Fig. 3, it is not intended that an only bus or one kind The bus of type.
Memory, for storing program.Specifically, program may include program code, and said program code includes calculating Machine operational order.Memory may include memory and nonvolatile memory, and provide instruction and data to processor.
Processor is from then operation in corresponding computer program to memory is read in nonvolatile memory, in logical layer The device for data mark is formed on face.Processor executes the program that memory is stored, and specifically for executing following behaviour Make:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one A character;
It, there are when different recognition results, is being determined in the multiple recognition result in determining the multiple recognition result Candidate result and interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification Situation includes identifying successfully and recognition failures.
The above-mentioned method executed for the device of data mark as disclosed in the application Fig. 1 and embodiment illustrated in fig. 2 can be with It is realized applied in processor, or by processor.Processor may be a kind of IC chip, the processing energy with signal Power.During realization, each step of the above method can pass through the integrated logic circuit or software of the hardware in processor The instruction of form is completed.Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the public affairs in the embodiment of the present application Each method, step and the logic diagram opened.General processor can be microprocessor or the processor can also be any normal The processor etc. of rule.The step of method in conjunction with disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and hold Row complete, or in decoding processor hardware and software module combine execute completion.Software module can be located at deposits at random Reservoir, flash memory, read-only memory, this fields such as programmable read only memory or electrically erasable programmable memory, register In ripe storage medium.The storage medium is located at memory, and processor reads the information in memory, is completed in conjunction with its hardware The step of above method.
The method that the electronic equipment can also carry out Fig. 2, and realize the device embodiment shown in Fig. 2 marked for data Function, details are not described herein for the embodiment of the present application.
Certainly, other than software realization mode, other realization methods are not precluded in the electronic equipment of the application, for example patrol Collect the mode etc. of device or software and hardware combining, that is to say, that the executive agent of following process flow is not limited to each patrol Unit is collected, can also be hardware or logical device.
The embodiment of the present application also proposed a kind of computer readable storage medium, the computer-readable recording medium storage one A or multiple programs, the one or more program include instruction, which works as is held by the electronic equipment including multiple application programs When row, the method that the electronic equipment can be made to execute Fig. 1 and embodiment illustrated in fig. 2, and specifically for executing following methods:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one A character;
It, there are when different recognition results, is being determined in the multiple recognition result in determining the multiple recognition result Candidate result and interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification Situation includes identifying successfully and recognition failures.
Fig. 4 is the structural schematic diagram of the device for data source mark of one embodiment of the application.Referring to FIG. 4, In a kind of Software Implementation, the device 400 for data mark may include:Acquiring unit 401, determination unit 402 and sentence Disconnected unit 403, wherein
Acquiring unit 401 obtains multiple recognition results after treating the more wheel identifications of labeled data progress, the number to be marked According to including at least one character;
Determination unit 402, there are when different recognition results, is determining the multiple in determining the multiple recognition result Candidate result in recognition result and interference result;
Judging unit 403, according to the candidate result and the interference as a result, judging the identification feelings of the data to be marked Condition, the identification situation include identifying successfully and recognition failures.
According to the device for data mark of the embodiment of the present application, exist in multiple recognition results of data to be marked When different recognition result, can according in multiple recognition results candidate result and interference as a result, judging data to be marked It identifies situation, avoids the identification situation that could only judge data to be marked according to recognition result in multiple recognition result all sames The problem of effective rate of utilization of caused recognition result can not ensure ensures the effective rate of utilization of recognition result.
Optionally, as one embodiment, the determination unit 402:
The candidate result and the interference result are determined according to occurrence number, wherein the candidate result is described more The most recognition result of occurrence number in a recognition result, the interference result are that the interference is removed in the multiple recognition result As a result the recognition result except.
Optionally, as one embodiment, the quantity of the character for interfering result to include includes with the candidate result Character quantity it is identical.
Optionally, as one embodiment, the judging unit 403:
According to Distribution Algorithm model, the occurrence number of the occurrence number and the interference result of the candidate result, judge The type of the candidate result, wherein the type of the candidate result includes determining effective result, determining null result and wait for really Determine effectively as a result, the Distribution Algorithm model is required based on accuracy rate and the recognition result of training data to be marked trains to obtain;
According to the type of the candidate result, the identification situation of the data to be marked is judged.
Optionally, as one embodiment, the judging unit 403
When the type for judging the candidate result is determines effective result, determine that the data to be marked identify successfully, And the candidate result is determined as to the annotation results of the data to be marked.
Optionally, described to sentence when the type for judging the candidate result is determines effective result as one embodiment Disconnected unit 403:
The number that the number and the interference result occurred according to the candidate result occurs, updates the data to be marked The number that at least one character for including is correctly validated and the number being erroneously identified.
Optionally, as one embodiment, the judging unit 403:
When the type for judging the candidate result is determines null result, judgement has been directed to the data to be marked and has executed Relationship between the wheel number of identification operation and default identification wheel number;
When judging that the wheel number for being directed to the data execution identification operation to be marked takes turns number less than the default identification, needle The data to be marked are executed with identification operation again;
Identify operating as a result, updating the multiple recognition result according to being executed again for the data to be marked, And the identification situation of the data to be marked is judged with interference result according to the candidate result in updated multiple recognition results.
Optionally, as one embodiment, the judging unit 403:
When judging that the wheel number for being directed to the data execution identification operation to be marked takes turns number equal to the default identification, really The fixed data recognition failures to be marked.
Optionally, as one embodiment, the judging unit 403:
When the type for judging the candidate result is effective result to be determined, judge whether the candidate result meets institute State accuracy rate requirement;
When judging that the candidate result meets the accuracy rate requirement, determine that the data to be marked identify successfully, and The candidate result is determined as to the annotation results of the data to be marked.
Optionally, as one embodiment, the judging unit 403:
Determine the recognition accuracy of at least one character;
According to the recognition accuracy of at least one character and accuracy rate requirement, whether the candidate result is judged Meet the accuracy rate requirement.
Optionally, as one embodiment, the judging unit 403:
According to the Distribution Algorithm model, the candidate result occurrence number and it is described interference result occurrence number, Judge that the type of at least one character, the type of at least one character include determining significant character, determining invalidation word Symbol and significant character to be determined;
When judging at least one character for significant character to be determined, determine that the identification of at least one character is accurate True rate.
Optionally, as one embodiment, at least one character is at least two characters;
Wherein, the judging unit 403:
Determine the corresponding condition probability formula of the candidate result;
According to the condition probability formula and identification probability matrix, the recognition accuracy of at least one character is determined, Element in the identification probability matrix be used to describe a character at least one character be identified as it is described at least The probability of other characters in one character.
Optionally, as one embodiment, described according to the condition probability formula and identification probability matrix, determining extremely Before the recognition accuracy of a few character, the judging unit 403:
Determine that the identified total degree of at least one character is greater than or equal to default identification number.
The method that device 400 for data mark can also carry out Fig. 1 and embodiment illustrated in fig. 2, and realize and be used for data The device of mark is in the function of Fig. 1 and embodiment illustrated in fig. 2, and details are not described herein for the embodiment of the present application.
In short, the foregoing is merely the preferred embodiment of the application, it is not intended to limit the protection domain of the application. Within the spirit and principles of this application, any modification, equivalent replacement, improvement and so on should be included in the application's Within protection domain.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Including so that process, method, commodity or equipment including a series of elements include not only those elements, but also wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that wanted including described There is also other identical elements in the process of element, method, commodity or equipment.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.

Claims (16)

1. a kind of method for data mark, including:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one word Symbol;
, there are when different recognition results, the candidate in the multiple recognition result is being determined in determining the multiple recognition result As a result with interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification situation Including identifying successfully and recognition failures.
2. according to the method described in claim 1, candidate result in the multiple recognition result of the determination and interference as a result, Including:
The candidate result and the interference result are determined according to occurrence number, wherein the candidate result is the multiple knowledge The most recognition result of occurrence number in other result, the interference result are that the interference result is removed in the multiple recognition result Except recognition result.
3. according to the method described in claim 2, the quantity for the character that the interference result includes includes with the candidate result Character quantity it is identical.
4. according to the method in claim 2 or 3, it is described according to the candidate result and it is described interference as a result, judge described in The identification situation of data to be marked, including:
According to Distribution Algorithm model, the occurrence number of the occurrence number and the interference result of the candidate result, described in judgement The type of candidate result, wherein the type of the candidate result includes determining effective result, determining null result and to be determined have Effect is as a result, the Distribution Algorithm model is required based on accuracy rate and the recognition result of training data to be marked trains to obtain;
According to the type of the candidate result, the identification situation of the data to be marked is judged.
5. according to the method described in claim 4, the type according to the candidate result, judges the data to be marked Identify situation, including:
When the type for judging the candidate result is determines effective result, determine that the data to be marked identify successfully, and will The candidate result is determined as the annotation results of the data to be marked.
6. according to the method described in claim 5, when the type for judging the candidate result is determines effective result, also wrap It includes:
The number that the number and the interference result occurred according to the candidate result occurs, updates and is wrapped in the data to be marked The number that at least one character included is correctly validated and the number being erroneously identified.
7. according to the method described in claim 5, further including:
When the type for judging the candidate result is determines null result, judgement has been directed to the data to be marked and has executed identification Relationship between the wheel number of operation and default identification wheel number;
When judging that the wheel number for being directed to the data execution identification operation to be marked takes turns number less than the default identification, for institute It states data to be marked and executes identification operation again;
Identify operating as a result, updating the multiple recognition result according to being executed again for the data to be marked, and root The identification situation of the data to be marked is judged with interference result according to the candidate result in updated multiple recognition results.
8. according to the method described in claim 7, further including:
When judging that the wheel number for being directed to the data execution identification operation to be marked takes turns number equal to the default identification, institute is determined State data recognition failures to be marked.
9. according to the method described in claim 4, further including:
When the type for judging the candidate result is effective result to be determined, judge whether the candidate result meets the standard True rate requirement;
When judging that the candidate result meets the accuracy rate and requires, determine that the data to be marked identify successfully, and by institute State the annotation results that candidate result is determined as the data to be marked.
10. according to the method described in claim 9,
It is described to judge whether the candidate result meets the accuracy rate requirement, including:
Determine the recognition accuracy of at least one character;
According to the recognition accuracy of at least one character and accuracy rate requirement, judge whether the candidate result meets The accuracy rate requirement.
11. according to the method described in claim 10,
The recognition accuracy of determination at least one character, including:
According to the Distribution Algorithm model, the occurrence number of the occurrence number and the interference result of the candidate result, judge The type of at least one character, the type of at least one character include determine significant character, determine idle character and Significant character to be determined;
When judging at least one character for significant character to be determined, determine that the identification of at least one character is accurate Rate.
12. according to the method for claim 11, at least one character is at least two characters;
Wherein, the recognition accuracy of the determination at least one character, including:
Determine the corresponding condition probability formula of the candidate result;
According to the condition probability formula and identification probability matrix, the recognition accuracy of at least one character is determined, it is described A character of the element for describing at least one character in identification probability matrix is identified as described at least one The probability of other characters in character.
13. according to the method for claim 12, described according to the condition probability formula and identification probability matrix, determining Before the recognition accuracy of at least one character, further include:
Determine that the identified total degree of at least one character is greater than or equal to default identification number.
14. a kind of device for data mark, including:
Acquiring unit, obtains multiple recognition results after treating the more wheel identifications of labeled data progress, and the data to be marked include At least one character;
Determination unit, there are when different recognition results, determine the multiple identification knot in determining the multiple recognition result Candidate result in fruit and interference result;
Judging unit, it is described according to the candidate result and the interference as a result, judging the identification situation of the data to be marked Identification situation includes identifying successfully and recognition failures.
15. a kind of electronic equipment, including:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction uses the processing when executed Device executes following operation:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one word Symbol;
, there are when different recognition results, the candidate in the multiple recognition result is being determined in determining the multiple recognition result As a result with interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification situation Including identifying successfully and recognition failures.
16. a kind of computer-readable medium, the computer-readable medium storage one or more program is one or more of Program by the electronic equipment including multiple application programs when being executed so that the electronic equipment executes following operation:
Multiple recognition results after treating the more wheel identifications of labeled data progress are obtained, the data to be marked include at least one word Symbol;
, there are when different recognition results, the candidate in the multiple recognition result is being determined in determining the multiple recognition result As a result with interference result;
According to the candidate result and the interference as a result, judging the identification situation of the data to be marked, the identification situation Including identifying successfully and recognition failures.
CN201810115780.2A 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment Active CN108446695B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210197077.7A CN114677681A (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment
CN201810115780.2A CN108446695B (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810115780.2A CN108446695B (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210197077.7A Division CN114677681A (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment

Publications (2)

Publication Number Publication Date
CN108446695A true CN108446695A (en) 2018-08-24
CN108446695B CN108446695B (en) 2022-02-11

Family

ID=63191916

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201810115780.2A Active CN108446695B (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment
CN202210197077.7A Pending CN114677681A (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202210197077.7A Pending CN114677681A (en) 2018-02-06 2018-02-06 Method and device for data annotation and electronic equipment

Country Status (1)

Country Link
CN (2) CN108446695B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147852A (en) * 2019-05-29 2019-08-20 北京达佳互联信息技术有限公司 Method, apparatus, equipment and the storage medium of image recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164471A (en) * 2011-12-15 2013-06-19 盛乐信息技术(上海)有限公司 Recommendation method and system of video text labels
CN104795077A (en) * 2015-03-17 2015-07-22 北京航空航天大学 Voice annotation quality consistency detection method
CN105404896A (en) * 2015-11-03 2016-03-16 北京旷视科技有限公司 Annotation data processing method and annotation data processing system
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof
US20170316014A1 (en) * 2016-03-07 2017-11-02 International Business Machines Corporation Evaluating quality of annotation

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2797721B2 (en) * 1991-01-08 1998-09-17 日本電気株式会社 Character recognition device
JP2980059B2 (en) * 1997-05-19 1999-11-22 日本電気株式会社 Character recognition method and apparatus, and recording medium storing character recognition program
JP2000215313A (en) * 1999-01-22 2000-08-04 Mitsubishi Electric Corp Method and device identifying data
CN104598937B (en) * 2015-01-22 2019-03-12 百度在线网络技术(北京)有限公司 The recognition methods of text information and device
CN104766077B (en) * 2015-04-03 2017-04-12 北京奇虎科技有限公司 Method and device for recognizing characters in picture

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164471A (en) * 2011-12-15 2013-06-19 盛乐信息技术(上海)有限公司 Recommendation method and system of video text labels
CN104795077A (en) * 2015-03-17 2015-07-22 北京航空航天大学 Voice annotation quality consistency detection method
CN105404896A (en) * 2015-11-03 2016-03-16 北京旷视科技有限公司 Annotation data processing method and annotation data processing system
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
US20170316014A1 (en) * 2016-03-07 2017-11-02 International Business Machines Corporation Evaluating quality of annotation
CN105975980A (en) * 2016-04-27 2016-09-28 百度在线网络技术(北京)有限公司 Method of monitoring image mark quality and apparatus thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147852A (en) * 2019-05-29 2019-08-20 北京达佳互联信息技术有限公司 Method, apparatus, equipment and the storage medium of image recognition
US11263483B2 (en) 2019-05-29 2022-03-01 Beijing Dajia Internet Information Technology Co., Ltd. Method and apparatus for recognizing image and storage medium

Also Published As

Publication number Publication date
CN114677681A (en) 2022-06-28
CN108446695B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN108615119B (en) Abnormal user identification method and equipment
CN110795657B (en) Article pushing and model training method and device, storage medium and computer equipment
CN109766902A (en) To the method, apparatus and equipment of the vehicle cluster in same region
CN107423278A (en) The recognition methods of essential elements of evaluation, apparatus and system
CN107273195A (en) A kind of batch processing method of big data, device and computer system
CN104850556B (en) A kind of method and device of data processing
CN111597548B (en) Data processing method and device for realizing privacy protection
CN110209729B (en) Method and device for identifying data transfer object
US8577825B2 (en) System, method and device for solving problems in NP without hyper-polynomial cost
CN112528616A (en) Business form generation method and device, electronic equipment and computer storage medium
CN109271611A (en) A kind of data verification method, device and electronic equipment
CN115756780A (en) Quantum computing task scheduling method and device, computer equipment and storage medium
US11609897B2 (en) Methods and systems for improved search for data loss prevention
CN115344805A (en) Material auditing method, computing equipment and storage medium
CN108073703A (en) A kind of comment information acquisition methods, device, equipment and storage medium
CN108446695A (en) Method, apparatus and electronic equipment for data mark
CN109816004A (en) Source of houses picture classification method, device, equipment and storage medium
CN111401959B (en) Risk group prediction method, apparatus, computer device and storage medium
CN109582834A (en) Data Risk Forecast Method and device
CN109345081A (en) A kind of collecting method, device and electronic equipment
CN112416772A (en) Test case completion method and device, electronic equipment and readable storage medium
CN106294115A (en) The method of testing of a kind of application system animal migration and device
CN110162689A (en) Information-pushing method, device, computer equipment and storage medium
WO2021051568A1 (en) Method and apparatus for constructing road network topological structure, and computer device and storage medium
CN111611388A (en) Account classification method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant