CN110222244A - A kind of the audit method for pushing and device of labeled data - Google Patents
A kind of the audit method for pushing and device of labeled data Download PDFInfo
- Publication number
- CN110222244A CN110222244A CN201910458916.4A CN201910458916A CN110222244A CN 110222244 A CN110222244 A CN 110222244A CN 201910458916 A CN201910458916 A CN 201910458916A CN 110222244 A CN110222244 A CN 110222244A
- Authority
- CN
- China
- Prior art keywords
- labeled data
- auditing
- labeler
- audit
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90348—Query processing by searching ordered data, e.g. alpha-numerically ordered data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
Abstract
The invention discloses the audit method for pushing and device of a kind of labeled data, are related to technical field of data processing, and main purpose is to push the labeled data for more having audit to be worth for audit, to improve labeled data review efficiency;Main technical schemes comprise determining that the sequence for not auditing labeled data respectively for not auditing labeled data concentration;It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data, which is concentrated to extract, does not audit the auditor that labeled data is pushed to setting;And the auditing result data of auditor are collected, collected auditing result data are based on according to certain frequency, update the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to the audit method for pushing and dress of a kind of labeled data
It sets.
Background technique
With the arriving of big data era, the data volumes of numerous industries at geometric progression growth.In order to preferably to sea
Amount data are utilized, and are usually labeled to data, to allow data preferably to drive production, operation, life etc. each
Kind activity.When data application is under the scenes such as machine learning and data mining, in order to keep the labeled data marked more preferable
It is more acurrate, usually audited to the labeled data marked.
The labeled data marked is audited currently, generalling use manual examination and verification mode.In manual examination and verification, examine
Core person needs to carry out manual examination and verification to labeled data one by one, and whole labeled data be intended to complete after being reviewed one by one it is whole
A artificial review process.As it can be seen that this manual examination and verification mode places one's entire reliance upon, the initiative recognition of auditor is completed to audit, and nothing
It is required to by the high-quality of labeled data or badly audit in turn, audit blindness is larger, lower so as to cause review efficiency.
Summary of the invention
In view of this, main purpose is to push more the invention proposes a kind of checking method of labeled data and device
The labeled data for having audit to be worth is for audit, to improve labeled data review efficiency.
In a first aspect, the present invention provides a kind of audit of labeled data push, this method comprises:
Determine the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data, which is concentrated, extracts
The auditor that labeled data is pushed to setting is not audited;And
The auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, update not
The sequence for not auditing labeled data respectively that audit labeled data is concentrated.
Second aspect, the present invention provides a kind of audit device of labeled data, which includes:
Determination unit, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit, for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit is marked
It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting;And
Updating unit is tied according to certain frequency based on collected audit for collecting the auditing result data of auditor
Fruit data update the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
The third aspect, the present invention provides a kind of computer readable storage medium, the storage medium includes the journey of storage
Sequence, wherein equipment where controlling the storage medium in described program operation executes described in any one of first aspect
The audit method for pushing of labeled data.
Fourth aspect, the present invention provides a kind of storage management apparatus, comprising:
Memory, for storing program;
Processor is coupled to the memory, for running described program to execute described in any one of first aspect
Labeled data audit method for pushing.
By above-mentioned technical proposal, the audit method for pushing and device of labeled data provided by the invention, determination are not audited
The sequence for not auditing labeled data respectively that labeled data is concentrated, and mark number is not audited based on each of labeled data concentration is not audited
According to sequence, never audit labeled data, which is concentrated to extract, does not audit the auditor that labeled data is pushed to setting.It is audited in auditor
When complete labeled data, the auditing result data of auditor are collected, are based on collected auditing result data according to certain frequency, more
The sequence for not auditing labeled data respectively of labeled data concentration is not audited newly.As it can be seen that foundation can be more when auditing labeled data
New labeled data sequence push of not auditing more has the labeled data of audit value to audit for auditor, therefore mark can be improved
Data review efficiency.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.
Fig. 1 shows a kind of flow chart of the audit method for pushing of labeled data provided by one embodiment of the present invention;
Fig. 2 shows a kind of flow charts for auditing method for pushing for labeled data that another embodiment of the present invention provides;
Fig. 3 shows a kind of structural representation of the audit driving means of labeled data provided by one embodiment of the present invention
Figure;
Fig. 4 shows a kind of structural representation of the audit driving means of labeled data of another embodiment of the present invention offer
Figure.
Specific embodiment
It is described more fully the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing this public affairs in attached drawing
The exemplary embodiment opened, it being understood, however, that may be realized in various forms the disclosure without the implementation that should be illustrated here
Example is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the model of the disclosure
It encloses and is fully disclosed to those skilled in the art.
As shown in Figure 1, the embodiment of the invention provides a kind of audit method for pushing of labeled data, this method is mainly wrapped
It includes:
101, the sequence for not auditing labeled data respectively for not auditing labeled data concentration is determined.
In practical applications, it does not audit labeled data concentration to have and largely do not audit labeled data, these labeled data
It is to be obtained by least one labeler by marking original unlabeled data, original unlabeled data described here may include
But at least one of it is not limited to text data, image data, voice data and video data or a variety of.
Specifically, not auditing mark based on the different maintaining methods for not auditing labeled data collection and different audit requirements
Note data set at least exists following several:
The first, does not audit labeled data concentration including setting quantity and does not audit labeled data, do not audit labeled data
The quantity for not auditing labeled data concentrated is reduced with the manual examination and verification process of labeled data.
Second, labeled data concentration is not audited including setting quantity and does not audit labeled data, does not audit labeled data
The quantity of the labeled data of concentration obtains at least one mark by specified interface with the manual examination and verification process of labeled data
The corresponding new labeled data of person, which adds to, does not audit labeled data concentration, to maintain not auditing the unexamined of labeled data concentration
The quantity of core labeled data is constant.
The third, the quantity for not auditing labeled data for not auditing labeled data concentration without limitation, corresponds to specific
At least one labeler, labeled data of the specific labeler within the period of setting, which is collected into, does not audit labeled data
In.
4th kind, the quantity for not auditing labeled data for not auditing labeled data concentration without limitation, corresponds to specific
At least one labeler pass through specified interface during the manual examination and verification of labeled data and obtain specific labeler phase
The new labeled data answered, which adds to, does not audit labeled data concentration, so that new labeled data can also obtain people in time
Work audit.
Specifically, determine do not audit labeled data concentration respectively do not audit labeled data sequence process specifically include as
Lower step: the confidence value for not auditing each labeled data of labeled data concentration, confidence value and corresponding labeled data are determined
The correct probability of mark is related.Based on the confidence value of each labeled data, the sequence for not auditing labeled data respectively is determined.
Confidence value involved in the embodiment of the present invention is related to the correct probability of the mark of corresponding labeled data, also
It is that the confidence value of labeled data can reflect the correctness of labeled data, therefore can pass through the confidence level of each labeled data
Value is ranked up each labeled data, so that the auditor of labeled data can be according to the confidence value of labeled data preferentially to most
It is worth the labeled data of audit to be audited, so that the audit of labeled data is more targeted.The confidence value of labeled data
It may include following several for determining method at least:
The first, the confidence value for not auditing each labeled data that labeled data is concentrated is obtained from specified interface, will acquire
Each confidence value be determined as the confidence value of each labeled data accordingly.Specified interface described here is connected to for calculating
The computing platform of the confidence value of labeled data.When obtaining demand there are confidence value, directly it is by specified interface acquisition
Can, since confidence value is directly to obtain by specified interface, it can quickly determine the confidence value of labeled data.
Second, the auditing result data for having audited labeled data are obtained, mark is not audited based on the determination of auditing result data
Infuse the confidence value of each labeled data in data set.
It should be noted that auditing result data are setting quantity in minor sort headed by not auditing labeled data collection
History has audited the auditing result data of labeled data.Do not audit labeled data collection be it is non-sort for the first time when, in order to unexamined
The confidence value of the labeled data of core optimizes, so that confidence value more can reflect the mark of corresponding labeled data just
True probability, then auditing result data are to set the auditor of the auditing result data and collection of having audited labeled data of quantity
Auditing result data.
Specifically, the auditing result data for having audited labeled data of setting quantity can at least pass through following four kinds of approach
It obtains: first is that, the audit labeled data for setting quantity is determined in the database for audited labeled data from being stored with, and extract
The determining auditing result data for having audited labeled data.The labeled data of audit of determination described here can for it is unexamined
Core labeled data is identical related or similar data, wherein can audit labeled data based on semantic similar principle judgement
Whether to not audit labeled data same or similar or related.Second is that being obtained from the specific network platform by web crawlers
Belong to the audit labeled data that same type marks task with labeled data is not audited, from the labeled data of audit of acquisition
It determines the audit labeled data of setting quantity, and extracts the determining auditing result data for having audited labeled data.Here institute
The labeled data of audit for the determination stated can for do not audit that labeled data is identical related or similar data, wherein can be with
Based on semantic similar principle judgement audited labeled data whether to not audit labeled data same or similar or related.It needs
Illustrate, mark task type can based on initial data pattern (for example, initial data pattern be lteral data, video
Data) or the affiliated industry of initial data it is related.Third is that the labeled data never audited, which is concentrated, extracts a certain number of labeled data
It is pushed to auditor's audit, auditor is collected for these and is pushed the auditing result data of labeled data.Here described
Certain amount is preset quantity, for example 100 or described certain amounts are that labeled data concentrates the hundred of labeled data total amount
Divide ratio, for example, labeled data total amount is 1000, then certain amount is the product of 1000 and 10%.Fourth is that in the mark that do not audit
When each labeled data in data set needs the auditing result for having audited labeled data concentrated based on labeled data to be updated,
The auditing result data for having audited labeled data for then setting quantity include: the audit obtained from database or the network platform
The auditing result data of labeled data and the auditor of collection audit the auditing result data for the labeled data that labeled data is concentrated.
Such approach can optimize the confidence value for the labeled data that do not audit, so that confidence value more can reflect accordingly
The correct probability of the mark of labeled data.
Specifically, auditing result data include following information: not auditing the history mark behavior of the labeler of labeled data
Information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
Specifically, determining each labeled data for not auditing labeled data concentration based on the information that auditing result data include
Confidence value method include at least it is following several:
The content that method one, the history mark behavioural information based on the labeler for not auditing labeled data include, calculates not
Audit the confidence value of labeled data.
Specifically, history mark behavioural information includes following content: not auditing the labeler of labeled data, auditing mark
The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in note data.
Method two is directed to based on the labeler for not auditing labeled data and is not audited the mark behavioural information of labeled data and include
Content, and do not audit the history mark behavioural information content that includes of the labeler of labeled data, mark is not audited in calculating
The confidence value of data.
Specifically, labeler is for one or more that the mark behavioural information for not auditing labeled data includes in following content
It is a: labeler mark does not audit the mark duration of labeled data, labeler marks the label time point for not auditing labeled data and
The space-number between labeled data and the last labeled data of its labeler marking error is not audited.
Specifically, the history mark behavioural information for not auditing the labeler of labeled data include one in following content or
Multiple: labeler has audited the audit that the average mark duration, labeler of correct labeled data are marked in labeled data
In labeled data the labeled data of marking error it is corresponding error the period, labeler the labeled data of audit in marking error
Labeled data occur equispaced number, labeler audited marked in labeled data the quantity of correct labeled data with
And the total amount for having audited labeled data of labeler.
The content that method three, the history mark behavioural information based on the labeler for not auditing labeled data include, calculates institute
State the confidence value for not auditing labeled data.
Specifically, the history mark behavioural information for not auditing the labeler of labeled data includes following content: labeler
The total amount for having audited labeled data of quantity and labeler that correct labeled data is marked in labeled data is audited.
Method four, method two are combined with method three, determine the confidence value for not auditing labeled data.
Below to the confidence value based on each labeled data, determine that the sequence for not auditing labeled data respectively is illustrated, really
The method of the fixed sequence for not auditing labeled data respectively includes at least following several:
The first, never audit labeled data is concentrated, and is chosen confidence value and is located at not auditing in preset threshold interval
Labeled data;According to the audit behavioural information content that includes of the labeler for respectively not auditing labeled data of selection, described in determination
That chooses does not audit the sequence of labeled data respectively.
Specifically, audit behavioural information includes one or more of following content: it is correct that mark continuously occurs in labeler
Quantity, the labeler of labeled data continuously there is the quantity of the labeled data of marking error and labeler is reviewed time
Number.
Specifically, there are two types of confidence value includes, the history mark of the labeler one is confidence value based on labeled data
Behavioural information is infused, or not auditing the confidence value of labeled data respectively is the labeler based on labeled data for the labeled data
Mark behavioural information and obtain.One is confidence value based on the labeler of labeled data for the mark row of the labeled data
For the history of information and the labeler of labeled data marks behavioural information and obtains.One labeled data can correspond to upward confidence level
One or more of value.Therefore the difference based on the corresponding confidence value of labeled data, never audit labeled data are concentrated, choosing
The process for not auditing labeled data for taking confidence value to be located in preset threshold interval includes at least following several:
The first, is the history mark row of the labeler based on labeled data in the confidence value for not auditing labeled data respectively
For information, or not auditing the confidence value of labeled data respectively is mark that the labeler based on labeled data is directed to the labeled data
When infusing behavioural information and obtaining, never audit labeled data is concentrated, and is chosen confidence value and is located at not auditing in first threshold section
Labeled data.Labeled data of the confidence value outside first threshold section mark occurs correctly or the probability of marking error is higher,
Therefore be unworthy audit labeled data, and confidence value be located in first threshold section do not audit labeled data its mark
Correctness cannot clearly be judged, therefore it is worth to the labeled data of audit the most.
Second, when not auditing the corresponding confidence value of labeled data is two, and a confidence value is based on mark
It infuses the history mark behavioural information of the labeler of data and obtains, another confidence value is that the labeler based on labeled data is directed to
The history of the labeler of the mark behavioural information and labeled data of the labeled data marks behavioural information and obtains, and never audits
Labeled data is concentrated, and confidence value obtained by the history mark behavioural information of the labeler based on labeled data chooses confidence level
Value, which is located in second threshold section, does not audit labeled data;Labeler based on labeled data is directed to the mark of the labeled data
Confidence value obtained by the history mark behavioural information of the labeler of behavioural information and labeled data is infused, mark is not audited to selection
Note data are ranked up, and from not auditing in labeled data for selection, selection confidence value is located at unexamined in third threshold interval
Core labeled data.
Specifically, the history of the labeler having the same based on labeled data of the same labeler marks behavioural information
Obtained by confidence value, therefore, it is possible to use the labeler based on labeled data history mark behavioural information obtained by confidence
Angle value first screens labeler, screens whole labeled data of some labelers, and the confidence value screened out is second
The appearance mark of labeled data outside threshold interval is correctly or the probability of marking error is higher, therefore is the mark for being unworthy audit
Data, and confidence value be located in second threshold section do not audit labeled data its mark correctness cannot clearly carry out
Judgement, therefore extracted, reuse the mark behavior that the labeler based on labeled data is directed to the labeled data
Confidence value obtained by the history of information and the labeler of labeled data mark behavioural information is screened.It screens out again
Labeled data, and these labeled data may belong to different labelers, and the confidence value screened out is in third threshold zone
Between outer labeled data appearance mark is correct or the probability of marking error is higher, therefore be the labeled data for being unworthy audit,
And confidence value be located in third threshold interval do not audit labeled data its mark correctness cannot clearly be judged,
Therefore it is worth to the labeled data of audit the most.
It is concentrated it should be noted that no matter never auditing labeled data using which kind of mode among the above, chooses confidence level
Value, which is located in preset threshold interval, does not audit labeled data, can to threshold interval outside do not audit labeled data do as
Lower processing: first is that, choose labeled data there is the labeled data of the Probability maximum of marking error, in order to reduce the work of auditor
It measures, this part labeled data will not participate in audit, directly send the labeled data of selection to setting by the corresponding interface
Mark personnel are marked again.The mark personnel set described here is the original labelers of these labeled data, or, being
System assert the mark higher labeler of correct probability.Second is that, the labeled data chosen occurs marking the mark of correct Probability maximum
Infuse data, in order to reduce the workload of auditor, this part labeled data will not participate in audit, determine the labeled data of selection without
Manual examination and verification are needed, Direct Mark is that audit passes through.Third is that the labeled data of not auditing outside threshold interval remains in not
It audits labeled data to concentrate, when newly getting the auditing result data for having audited labeled data of preset quantity again, based on original
Some auditing result data and the auditing result data newly obtained update the confidence value for not auditing labeled data respectively, thus right
The confidence value of labeled data optimizes, so that confidence value more can reflect the correct general of the mark of corresponding labeled data
Rate.
In order to further determine to be worth the labeled data of audit, is concentrated in never audit labeled data, choose confidence
Angle value be located in preset threshold interval do not audit labeled data after, need respectively not audit labeled data according to selection
The content that the audit behavioural information of labeler includes determines that chooses does not audit the sequence of labeled data respectively.
Illustratively, audit behavioural information includes that labeler the correct quantity of mark continuously occurs.The quantity is to collect to examine
Core person obtains for the auditing result data for the labeled data for not auditing labeled data concentration, continuously marks in labeler
When the quantity of correct labeled data is greater than the amount threshold of setting, illustrate the labeler occur marking error probability it is lower,
Then to selecting when respectively not auditing labeled data and be ranked up, the labeled data of the labeler is come and is unworthy audit
Position (such as tail of the queue) illustrates that these labeled data are unworthy auditing.
Illustratively, audit behavioural information includes note the quantity of the labeled data of marking error continuously occurs in person.The quantity
It is to collect auditor to obtain for the auditing result data for the labeled data for not auditing labeled data concentration, it is continuous in labeler
When the quantity for the labeled data of marking error occur is greater than the amount threshold of setting, illustrate that the general of marking error occurs in the labeler
Rate is higher, then to selecting when respectively not auditing labeled data and be ranked up, the labeled data of the labeler is come worth
The position (such as head of the queue) of audit illustrates that these labeled data are worth audit.
Illustratively, audit behavioural information includes the number that labeler is reviewed.The quantity is to collect auditor for not
What the auditing result data for the labeled data that audit labeled data is concentrated obtained, it is less than setting in the number that labeler is reviewed
When frequency threshold value, it is less to illustrate that the labeled data of the labeler is reviewed number, therefore cannot clearly judge that mark is correct or marks
Infuse the probability of mistake.Therefore, the labeled data of the labeler is worth audit, and the labeled data of the labeler is come worth audit
Position (such as head of the queue), illustrate these labeled data be worth audit.
Illustratively, audit behavioural information includes note the quantity and labeler of the labeled data of marking error continuously occurs in person
The number being reviewed.It is less than the frequency threshold value of setting in the number that labeler is reviewed, and marking error continuously occurs in labeler
Labeled data quantity be greater than setting amount threshold, illustrate the labeler occur marking error probability it is higher, then right
Select when respectively not auditing labeled data and be ranked up, the labeled data of the labeler is come to the position (ratio for being worth audit
Such as head of the queue), illustrate that these labeled data are worth audit.
Second, never audit labeled data is concentrated, and is chosen confidence value and is located at not auditing in preset threshold interval
Labeled data;The size of the confidence value of labeled data based on selection, determine the selection does not audit labeled data respectively
Sequence.
Specifically, never audit labeled data here is concentrated, chooses confidence value and be located in preset threshold interval
The process for not auditing labeled data is discussed essentially identical with the first, therefore will not be described in great detail here.
Specifically, the confidence value obtained by using the history of the labeler based on labeled data mark behavioural information, or
Mark number is selected for confidence value obtained by the mark behavioural information of labeled data using the labeler based on labeled data
According to when, then according to the history of the labeler based on labeled data mark behavioural information obtained by confidence value, or based on mark number
According to labeler for the size of confidence value obtained by the mark behavioural information of labeled data, determine each mark of not auditing chosen
The sequence of data is infused, which has symbolized the degree that each labeled data is worth audit.
Specifically, when not auditing the corresponding confidence value of labeled data is two, and a confidence value is based on mark
It infuses the history mark behavioural information of the labeler of data and obtains, another confidence value is that the labeler based on labeled data is directed to
The labeler of the mark behavioural information and labeled data of the labeled data history mark behavioural information and obtain, using this two respectively
When confidence value selects labeled data, then the mark behavior of the labeled data is directed to according to the labeler based on labeled data
The history of information and the labeler of labeled data marks the size of confidence value obtained by behavioural information, determines each unexamined of selection
The sequence of core labeled data, the sequence have symbolized the degree that each labeled data is worth audit.
The third determines the ordering score for not auditing labeled data respectively using formula (1);Size based on ordering score,
Determine the sequence for not auditing labeled data respectively.
Wherein, SjCharacterize j-th of ordering score for not auditing labeled data;MjIt characterizes j-th and does not audit setting for labeled data
Certainty value;MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization.
Specifically, ordering score is related with audit necessity degree of corresponding labeled data, can reflect out labeled data is
No worth audit.
It should be noted that determining that the ordering score for not auditing labeled data respectively may include before utilizing formula (1)
Never audit labeled data is concentrated, and selection confidence value, which is located in preset threshold interval, does not audit labeled data, realization pair
The preliminary screening of labeled data, to reduce the audit amount of labeled data.
102, it is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data is concentrated
The auditor that labeled data is pushed to setting is not audited in extraction.
Specifically, not auditing labeled data sequence respectively reflects whether labeled data is worth audit, mark is not audited extracting
Data-pushing is infused to the auditor of setting, is pushed since the labeled data of most worth audit, the labeled data pushed each time
It is the labeled data currently most directly audited in sequence, so that improving examines auditor more targetedly to labeled data
Core.
Specifically, when never audit labeled data concentrates extraction not audit labeled data and is pushed to the auditor of setting,
It can be corresponding to show labeled data and the corresponding confidence value of labeled data in visual form.
103, the auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, more
The sequence for not auditing labeled data respectively of labeled data concentration is not audited newly.
Specifically, auditing result data can be got by specified interface when auditor audits labeled data completion.
Reach the quantity of setting having audited labeled data quantity or when current time reaches preset time, then examined based on collected
Core result data, update does not audit each of labeled data concentration and does not audit the sequence of labeled data, to set to labeled data
Certainty value optimizes, so that confidence value more can reflect the correct probability of the mark of corresponding labeled data.It needs to illustrate
, collected auditing result data are based on, the sequence for not auditing labeled data respectively for not auditing labeled data concentration is updated
The confidence value of each labeled data concentrated with the unexamined core labeled data of determination among the above of process, based on each labeled data
Confidence value determines that the process for the sequence for respectively not auditing labeled data is essentially identical, therefore will not be described in great detail here.
The audit method for pushing of labeled data provided in an embodiment of the present invention, determine do not audit labeled data concentration it is each not
The sequence of labeled data is audited, and is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit mark
It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting.When auditor has audited labeled data, collect
The auditing result data of auditor are based on collected auditing result data according to certain frequency, and labeled data is not audited in update
That concentrates does not audit the sequence of labeled data respectively.As it can be seen that not auditing mark number according to what can be updated when auditing labeled data
It is audited according to the labeled data that sequence push more has audit to be worth for auditor, therefore labeled data review efficiency can be improved.
Further, method according to figure 1, another embodiment of the invention additionally provide a kind of labeled data
Method for pushing is audited, as shown in Fig. 2, the method specifically includes that
201, the confidence value for not auditing each labeled data of labeled data concentration, the confidence value and corresponding mark are determined
The correct probability for infusing the mark of data is related.
Specifically, the form of labeled data collection involved in this step and labeled data concentrate setting for a labeled data
Reliability value-acquiring method is substantially identical as the detailed annotation in above-mentioned steps 101, therefore will not be described in great detail here.Below to step 101
Detailed annotation in labeled data confidence value-acquiring method in second " obtain and audited the auditing result data of labeled data,
The confidence value for not auditing each labeled data of labeled data concentration is determined based on auditing result data " it is illustrated: based on careful
Core result data determines the specific steps for not auditing the confidence value of each labeled data of labeled data concentration are as follows: is directed to each
Labeled data is not audited, and the information for including based on the auditing result data determines the confidence level for not auditing labeled data
Value.It should be noted that auditing result data include following information: not auditing the history mark behavior of the labeler of labeled data
Information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.Therefore include according to auditing result data
Information is different, and the information for including based on auditing result data determines the implementation of each confidence value for not auditing labeled data extremely
Less include following several:
The first, history based on the labeler for the not auditing labeled data mark behavioural information content that includes, calculate not
Audit the confidence value of labeled data.Specifically, history mark behavioural information includes following content: not auditing the mark of labeled data
Note person marks the quantity of the quantity of correct labeled data and the labeled data of marking error in having audited labeled data.
Specifically, the content that the history mark behavioural information based on the labeler for not auditing labeled data includes, calculates not
The method for auditing the confidence value of labeled data includes the following two kinds:
1, the confidence value for not auditing labeled data is calculated by formula (2).
Wherein, MjThe confidence value of labeled data is not audited described in j-th of characterization;AiIt characterizes j-th and does not audit labeled data
Labeler i the quantity of correct labeled data is marked in having audited labeled data;BiIt characterizes j-th and does not audit labeled data
Labeler i labeled data of marking error in having audited labeled data quantity;A characterization is greater than 0 constant;B characterization is big
In 0 constant.
Specifically, the confidence value for the labeled data being calculated by formula (2) be it is related to labeler historical behavior,
That is, the labeled data confidence value having the same of same labeler mark.In the way of such to not auditing respectively
When labeled data is ranked up, the labeled data of the same labeler can continuous arrangement.Which may determine that by this sequence
The most worth audit of the labeled data of labeler is realized and carries out concentrating push and audit to the labeled data of the same labeler.It adopts
The confidence value being calculated with formula (2), the confidence value of labeled data the big, illustrates that the mark of labeled data is correctly general
Rate is higher.
Specifically, the specific value of constant a and b in formula (2) can specifically business need determine.Illustratively, a and
The equal value of b is 1.Determine constant a and b, and the purpose that constant a and b are all larger than 0 is caused in order to avoid there is A and/or B=0
The case where can not determining confidence value generation.
Formula (1) is illustrated with one embodiment below: it is illustrative, the labeler 1 of labeled data 1 is not audited,
The quantity that correct labeled data is marked in labeled data has been audited as " 900 " and marking error set quantity " 1000 "
Labeled data quantity " 100 ", the equal value of constant a and b be 1, then based on formula (1) determine do not audit setting for labeled data 1
Certainty value are as follows:
2, the confidence value for not auditing labeled data is calculated by formula (3).
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;AiCharacterize j-th of mark for not auditing labeled data
Note person i marks the quantity of correct labeled data in having audited labeled data;BiCharacterize j-th of mark for not auditing labeled data
The quantity of note person i labeled data of marking error in having audited labeled data;E characterization is greater than 0 constant;F characterization is greater than 0
Constant;G characterization is greater than 0 constant.
Specifically, the confidence value for the labeled data being calculated by formula (3) be it is related to labeler historical behavior,
That is, the labeled data confidence value having the same of same labeler mark.In the way of such to not auditing respectively
When labeled data is ranked up, the labeled data of the same labeler can continuous arrangement.Which may determine that by this sequence
The most worth audit of the labeled data of labeler is realized and carries out concentrating push and audit to the labeled data of the same labeler.It adopts
The confidence value being calculated with formula (3), the confidence value of labeled data the big, illustrates that the mark of labeled data is correctly general
Rate is higher.
Specifically, the specific value of constant e, f and g in formula (1) can specifically business need determine.Illustratively,
E, the equal value of f and g is 1.Determine constant e, f and g, and constant e, f and g be all larger than 0 purpose be in order to avoid occur A=0 and/
Or B=0, lead to not the case where determining confidence value generation.
Formula (2) is illustrated with one embodiment below: it is illustrative, the labeler 2 of labeled data 2 is not audited,
The quantity that correct labeled data is marked in labeled data has been audited as " 900 " and marking error set quantity " 1000 "
Labeled data quantity " 100 ", the equal value of constant a and b be 1, then based on formula (2) determine do not audit setting for labeled data 2
Certainty value are as follows:
Second, include for the mark behavioural information for not auditing labeled data based on the labeler for not auditing labeled data
Content, and do not audit the history mark behavioural information content that includes of the labeler of labeled data, mark is not audited in calculating
The confidence value of data;Wherein, labeler be directed to do not audit labeled data mark behavioural information include in following content one
It is a or multiple: when labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the mark of labeled data
Between put and do not audit the space-number between labeled data and the last labeled data of its labeler marking error.Mark is not audited
The history mark behavioural information for infusing the labeler of data includes one or more of following content: the mark of audit of labeler
The average mark duration of correct labeled data, the mark for having audited marking error in labeled data of labeler are marked in data
Data it is corresponding error the period, labeler audited marking error in labeled data labeled data occur equispaced
The audit labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data of number, labeler
Total amount.
Specifically, since the labeler for not auditing labeled data includes for the mark behavioural information for not auditing labeled data
Content it is different, and not audit the content that the history mark behavioural information of the labeler of labeled data includes different, therefore base
It is directed to the content that do not audit the mark behavioural information of labeled data and include in the labeler for not auditing labeled data, and does not audit
The content that the history mark behavioural information of the labeler of labeled data includes, calculates the side for not auditing the confidence value of labeled data
Method includes following several:
1, it is not audited in the mark duration of labeled data and the labeled data of audit of labeler based on labeler mark
The average mark duration for marking correct labeled data calculates the confidence value for not auditing labeled data by formula (4).
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;TijIt characterizes j-th and does not audit labeled data
Labeler i marks j-th of mark duration for not auditing labeled data;RiIt characterizes and is marked in the labeled data of audit of labeler i
The average mark duration of correct labeled data;N characterization is greater than or equal to 1 constant.
Specifically, using spent by the smaller labeled data for illustrating labeler mark of formula (3) calculated confidence value
Duration it is shorter, illustrate that labeler does not pay the duration that it is normally marked, labeler occurs being perfunctory to the probability of the behavior of mark
It is higher, so that the probability that marking error occurs in labeled data is higher.Illustrate to mark using formula (4) calculated confidence value is bigger
Duration spent by note person mark labeled data is longer, illustrates that labeler is paid duration spent by its normal mark or paid super
Its normal spent duration of mark is crossed, the probability that the behavior conscientiously marked occurs in labeler is higher, so that labeled data is marked
The probability for infusing mistake is lower.Therefore the confidence value being calculated using formula (4), the confidence value of labeled data the big, says
The mark correct probability of bright labeled data is higher.
Specifically, the specific value of the constant n in formula (4) can specifically business need determine.Illustratively, n value
It is 1.
Formula (4) is illustrated with one embodiment below: it is illustrative, the labeler 3 of labeled data 3 is not audited,
When labeler 3 marks the mark of labeled data 3 a length of " 5 minutes ", mark is correctly marked in the labeled data of audit of labeler 3
When infusing the average mark of data a length of " 4 minutes ", n value is 1.The confidence for not auditing labeled data 3 is then determined based on formula (3)
Angle value are as follows:
2, the label time point of labeled data and the audit labeled data of labeler are not audited based on labeler mark
The labeled data of the middle marking error corresponding error period calculates the confidence value for not auditing labeled data by formula (5).
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;tijIt characterizes j-th and does not audit labeled data
Labeler i marks j-th of label time point for not auditing labeled data;[t1i, t2i] characterization labeler i audit mark number
According to the labeled data of the middle marking error corresponding error period;M1 and m2 characterizes constant, and m2 is greater than m1.
Specifically, illustrating that labeler is higher in its marking error rate using formula (5) calculated confidence value is small
It is wrong that mark occurs in the labeled data for being labeled to obtain labeled data in period, therefore marking within this period
Probability accidentally is higher.Using formula (5) calculated confidence value, big to illustrate labeler not be higher in its marking error rate
The probability that marking error occurs in the labeled data for being labeled to obtain labeled data, therefore obtain in period is lower.Therefore
The confidence value being calculated using formula (4), the confidence value of labeled data the big, illustrates that the mark of labeled data is correct
Probability is higher.
Specifically, the specific value of m1 and m2 in formula (5) can specifically business need determine.It should be noted that
In order to distinguish the correct probability of labeled data, then m2 is greater than m1 when setting.Illustratively, m2 value 1, m1 value 0.95.
Formula (5) is illustrated with one embodiment below: it is illustrative, the labeler 4 of labeled data 4 is not audited,
The label time point that labeler 4 marks labeled data 4 is " 13:00 ", and labeler 4 has audited marking error in labeled data
The labeled data corresponding error period is " [12:00,14:00] ", m2 value 1, and m1 value 0.95 is then true based on formula (3)
The fixed confidence value for not auditing labeled data 4 are as follows:
M4=0.95 13:00 ∈ [12:00,14:00]
3, based on the space-number that do not audit between labeled data and the last labeled data of its labeler marking error,
And the equispaced number that the labeled data for having audited marking error in labeled data of labeler occurs, pass through the 6th formula meter
Calculate the confidence value for not auditing labeled data.
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;PijCharacterization do not audit for j-th labeled data with
Space-number between the last labeled data of its labeler i marking error;QiCharacterize the audit labeled data of labeler i
The equispaced number that the labeled data of middle marking error occurs;K1 and k2 characterizes constant, and k1 is greater than k2.
Specifically, labeler, when being labeled, with the progress of mark, labeler is marking a certain number of mark numbers
According to when, can generate mark fatigue, so as to cause the labeled data of marking error.And it marks fatigue strength and marking error can be used
Labeled data between be averaged and space-number occur to characterize.It can reflect mark by the space-number between labeled data
The correct probability of the mark of data.Illustrate that labeler is to mark fatigue strength at it using formula (6) calculated confidence value is small
The probability that marking error occurs in the lower labeled data for being labeled to obtain labeled data, therefore obtain is higher.Using formula
(6) calculated confidence value is big illustrates that labeler is to be labeled to obtain labeled data when its mark fatigue strength is higher,
Therefore the probability that marking error occurs in the labeled data obtained is lower.Therefore the confidence value being calculated using formula (6), mark
The the confidence value for infusing data the big, illustrates that the mark correct probability of labeled data is higher.
Specifically, the specific value of k1 and k2 in formula (6) can specifically business need determine.It should be noted that
In order to distinguish the correct probability of labeled data, then k1 is greater than k2 when setting.Illustratively, k1 value 1, k2 value 0.9.
Formula (6) is illustrated with one embodiment below: it is illustrative, to not audit the labeler of labeled data 5
5, the space-number between labeled data 5 and the last labeled data of its 5 marking error of labeler is " 5 ";Labeler 5 is
There is space-number " 100 ", k1 value 1, k2 value 0.9 in being averaged between the labeled data of marking error in audit labeled data.
The confidence value for not auditing labeled data 4 is then determined based on formula (5) are as follows:
4, formula (4), formula (5) and any two in formula (6) or it is multiple can be combined based on business need, adopt
With any two or multiple modes combined in formula (4), formula (5) and formula (6), labeled data is not audited in calculating
Confidence value.
When formula (4) and formula (5) combine, when the confidence value of labeled data is not audited in calculating, using following public affairs
Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data
It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4) and formula (5) in the formula.ω 1 and ω 2 is pre-
If weight, specific value can based on specific business determine.
When formula (4) and formula (6) combine, when the confidence value of labeled data is not audited in calculating, using following public affairs
Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data
It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4) and formula (6) in the formula.ω 3 and ω 4 is pre-
If weight, specific value can based on specific business determine.
When formula (5) and formula (6) combine, when the confidence value of labeled data is not audited in calculating, using following public affairs
Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data
It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (5) and formula (6) in the formula.ω 5 and ω 6 is pre-
If weight, specific value can based on specific business determine.
When formula (4) and formula (5) and formula (6) combine, when the confidence value of labeled data is not audited in calculating, adopt
With following formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data
It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4), formula (5) and formula (6) in the formula.ω7,ω8
It is preset weight with ω 9, specific value can be determined based on specific business.
5, the labeled data of the audit acceptance of the bid of the label time point of labeled data, labeler is not audited based on labeler mark
It infuses the labeled data corresponding error period of mistake, audit the last mark of labeled data Yu its labeler marking error
Between what the labeled data for having audited marking error in labeled data of space-number and labeler between note data occurred is averaged
Every number, the confidence value for not auditing labeled data is calculated by formula (7);
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;tijMark number is not audited described in j-th of characterization
According to labeler i mark and do not audit the label time point of labeled data for j-th;[t1i, t2i] characterization labeler i audit mark
Infuse the labeled data corresponding error period of marking error in data;M1 and m2 characterizes constant, and m2 is greater than m1;PijCharacterization
J-th of space-number that do not audit between labeled data and the last labeled data of its labeler i marking error;The QiTable
The equispaced number that the labeled data for having audited marking error in labeled data of sign labeler i occurs;K1 and k2 is characterized often
Number, and k1 is greater than k2.
Specifically, when having reflected that labeler marks to obtain labeled data using formula (7) calculated confidence value, mark
The time location of data is infused, which can reflect labeler and mark to obtain the fatigue strength of labeled data, therefore use
Formula (7) calculated confidence value can really reflect the correct probability of the mark of labeled data.
Specifically, illustrating that labeler is lower in its mark fatigue strength using formula (7) calculated confidence value is smaller
The probability that marking error occurs in the labeled data for being labeled to obtain labeled data, therefore obtain is higher.It is counted using formula (7)
The confidence value of calculating is bigger to illustrate that labeler is to be labeled to obtain labeled data when its mark fatigue strength is higher, therefore
The probability that marking error occurs in obtained labeled data is lower.Therefore the confidence value being calculated using formula (7) marks number
According to confidence value it is more big, illustrate that the mark correct probability of labeled data is higher.
6, formula (4), formula (5), formula (6) and any two in formula (7) or multiple business need can be based on
Combine, using in formula (3), formula (4) and formula (5) any two or it is multiple combine by the way of, calculating do not audit
The confidence value of labeled data.
Illustratively, it when formula (4) and formula (7) combine, when the confidence value of labeled data is not audited in calculating, adopts
With following formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data
It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4) and formula (7) in the formula.ω 10 and ω 11 are
Preset weight, specific value can be determined based on specific business.
7, the mark duration of labeled data is not audited based on labeler mark, labeler mark does not audit the mark of labeled data
Note time point, labeler have audited the average mark duration that correct labeled data is marked in labeled data, labeler
Audit labeled data in marking error labeled data it is corresponding error the period, labeler the labeled data of audit in mark
The correct quantity of labeled data and the total amount for having audited labeled data of labeler are calculated described unexamined by formula (8)
The confidence value of core labeled data;
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;TijIt characterizes j-th and does not audit labeled data
Labeler i marks j-th of mark duration for not auditing labeled data;RiIt characterizes and is marked in the labeled data of audit of labeler i
The average mark duration of correct labeled data;N characterization is greater than or equal to 1 constant;tijIt characterizes j-th and does not audit labeled data
Labeler i mark the label time point for not auditing labeled data for j-th;[t1i, t2i] characterization labeler i audit mark
The labeled data of the marking error corresponding error period in data;M1 and m2 characterizes constant, and m2 is greater than m1;PijCharacterization the
The j space-numbers that do not audit between labeled data and the last labeled data of its labeler i marking error;QiCharacterization mark
The equispaced number that the labeled data for having audited marking error in labeled data of person i occurs;K1 and k2 characterizes constant, and k1
Greater than k2;The α characterizes the first weight;The β characterizes the second weight;The γ characterizes third weight.
The mark row for not auditing labeled data is directed to described in above-mentioned 1-7 based on the labeler for not auditing labeled data
For the content that information includes, and the content that the history mark behavioural information of the labeler of labeled data includes is not audited, calculate
The method for not auditing the confidence value of labeled data, the confidence value for the labeled data being calculated not only with labeler history row
For correlation, and it is related for the mark behavior of labeled data with labeler.The labeled data of same labeler mark may have
There is different confidence values.It, can when the confidence value obtained in the way of such is determined and respectively do not audit the sequence of labeled data
To judge the most worth audit of which current labeled data, so that the audit of labeled data is more targeted.
The third, history based on the labeler for the not auditing labeled data mark behavioural information content that includes, meter
Calculate the confidence value for not auditing labeled data.Wherein, the history for not auditing the labeler of labeled data marks behavioural information and includes
Following content: the audit mark for having audited quantity and labeler that correct labeled data is marked in labeled data of labeler
Infuse the total amount of data.
Specifically, having audited for labeler is marked the quantity and labeler of correct labeled data in labeled data
The ratio between the total amount of labeled data has been audited, has been determined as not auditing the confidence value of labeled data.Obtained labeled data
Confidence value be related to labeler historical behavior, that is to say, that the labeled data of same labeler mark is having the same
Confidence value.When the confidence value obtained in the way of such is ranked up unexamined Nuclear Data, it can be determined which mark
The most worth audit of the labeled data of person, realization carry out concentration audit to the labeled data of the same labeler.Labeled data is set
The certainty value the big, illustrates that the mark correct probability of labeled data is higher.
It should be noted that having audited for labeler is marked the quantity and mark of correct labeled data in labeled data
The ratio of note person audited between the total amount of labeled data, is determined as the confidence value of the labeled data of confidence value to be determined
Method can be based on any one or more in above-mentioned formula (4), formula (5), formula (6) and formula (7)
Business need combines, and calculates the confidence value for not auditing labeled data.
Illustratively, when being combined with formula (4), when the confidence value of labeled data is not audited in calculating, using as follows
Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data
It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (3) and formula (7) in the formula.ω 13 and ω 14 are
Preset weight, specific value can be determined based on specific business.
202, the confidence value based on each labeled data determines the sequence for not auditing labeled data respectively.
Specifically, the confidence value based on each labeled data, determines the process and step for not auditing the sequence of labeled data respectively
Detailed annotation in rapid 101 is essentially identical, therefore will not be further discussed here.
203, it is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data is concentrated
The auditor that labeled data is pushed to setting is not audited in extraction.
Specifically, the detailed annotation of this step and the detailed annotation in step 103 are essentially identical, therefore will not be further discussed here.
204, the manual examination and verification process of the confidence value auxiliary labeled data based on labeled data.
Specifically, the manual examination and verification process of the confidence value auxiliary labeled data based on labeled data is including at least following several
Kind method:
The first, it is in visual form, corresponding to show labeled data and the corresponding confidence value of labeled data.
Specifically, visual pattern involved in such mode, shows labeled data for preset visualization window is corresponding
Confidence value corresponding with labeled data.When showing, it is shown according to sequence, so that auditor can arrange according to the displaying
Sequence quickly selects the labeled data of most worth audit in current presentation.
Second, the manual examination and verification result of labeled data is compared with respective confidence value, when comparison result meets
When preset condition, output indicates the prompt information that auditing result may be wrong.
Specifically, the manual examination and verification result of labeled data includes auditing to pass through and audit not passing through, audit is by illustrating to mark
It is correct to infuse data mark, audit, which does not pass through, then illustrates labeled data marking error.Obstructed manual examination and verification result corresponds to different
Confidence value section.When labeled data is reviewed completion, by the corresponding confidence value area of the manual examination and verification result of labeled data
Between confidence value corresponding with labeled data be compared, to verify and check the auditing result of auditor.It is artificial when judging
When not including the corresponding confidence value of labeled data in the corresponding confidence value section of auditing result, illustrate that auditor audits mark
The probability of audit error is higher when data, then exporting indicates the prompt information that auditing result may be wrong, to prompt auditor's weight
The labeled data is newly audited, to improve audit effect.When judging the corresponding confidence value section Nei Bao of manual examination and verification result
When including the corresponding confidence value of labeled data, illustrate that audit is correct when auditor audits labeled data, then the labeled data is audited
It finishes.
The third, is by two kinds of above-mentioned combinations, it is, first in visual form, it is corresponding to show labeled data and mark
Infuse the corresponding confidence value of data.Then, when auditor has audited labeled data, by the manual examination and verification result of labeled data with
Respective confidence value is compared, and when comparison result meets preset condition, output indicates the prompt that auditing result may be wrong
Information.
205, the auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, more
The sequence for not auditing labeled data respectively of labeled data concentration is not audited newly.
Specifically, the detailed annotation of this step and the detailed annotation in step 104 are essentially identical, therefore will not be further discussed here.
Further, according to above method embodiment, another embodiment of the invention additionally provides a kind of labeled data
Audit driving means, as shown in figure 3, described device includes:
Determination unit 31, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit 32 is never audited for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration
Labeled data, which is concentrated to extract, does not audit the auditor that labeled data is pushed to setting;And
Updating unit 33 is based on collected audit according to certain frequency for collecting the auditing result data of auditor
Result data updates the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
The audit driving means of labeled data provided in an embodiment of the present invention, determine do not audit labeled data concentration it is each not
The sequence of labeled data is audited, and is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit mark
It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting.When auditor has audited labeled data, collect
The auditing result data of auditor are based on collected auditing result data according to certain frequency, and labeled data is not audited in update
That concentrates does not audit the sequence of labeled data respectively.As it can be seen that not auditing mark number according to what can be updated when auditing labeled data
It is audited according to the labeled data that sequence push more has audit to be worth for auditor, therefore labeled data review efficiency can be improved.
Optionally, as shown in figure 4, the determination unit 31 includes:
First determines subelement 311, the confidence value of each labeled data for determining the labeled data that do not audit concentration,
The confidence value is related to the correct probability of the mark of corresponding labeled data;
Second determines subelement 312, for the confidence value based on each labeled data, determines and does not audit labeled data respectively
Sequence.
Optionally, as shown in figure 4, described second determines that subelement 312 includes:
First chooses module 3121, for concentrating from the labeled data of not auditing, chooses confidence value positioned at preset
Labeled data is not audited in threshold interval;
First determining module 3122, for according to selection respectively do not audit labeled data labeler audit behavioural information
Including content, determine the selection does not audit the sequence of labeled data respectively;
Wherein, the audit behavioural information includes one or more of following content: marking just continuously occurs in labeler
Time that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of true labeled data and labeler is reviewed
Number.
Optionally, as shown in figure 4, the determination unit 31 includes:
Third determines subelement 313, for utilizing formula (1), determines the ordering score for not auditing labeled data respectively;It is based on
The size of the ordering score determines the sequence for not auditing labeled data respectively;
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit mark
The confidence value of data;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization.
Optionally, as shown in figure 4, described second determines that subelement 312 includes:
Second chooses module 3123, for concentrating from the labeled data of not auditing, chooses confidence value positioned at preset
Labeled data is not audited in threshold interval;
Second determining module 3124, the size of the confidence value for the labeled data based on selection determine the selection
Do not audit the sequence of labeled data respectively.
Optionally, as shown in figure 4, being the labeler based on labeled data in the confidence value for not auditing labeled data respectively
History marks behavioural information, or not auditing the confidence value of labeled data respectively is the labeler based on labeled data for the mark
When infusing the mark behavioural information of data and obtaining,
First chooses module 3121 or the second selection module 3123, for concentrating from the labeled data of not auditing, chooses
Confidence value, which is located in first threshold section, does not audit labeled data.
Optionally, as shown in figure 4, when not auditing the corresponding confidence value of labeled data is two, and a confidence level
Value is that the history of the labeler based on labeled data marks behavioural information and obtains, another confidence value is based on labeled data
Labeler is obtained for the history of the mark behavioural information of the labeled data and the labeler of labeled data mark behavioural information,
First chooses module 3121 or the second selection module 3123, for concentrating from the labeled data that do not audit, base
The confidence value obtained by the history mark behavioural information of the labeler of labeled data, chooses confidence value and is located at second threshold area
Interior does not audit labeled data;Labeler based on labeled data is directed to the mark behavioural information and mark of the labeled data
Confidence value obtained by the history mark behavioural information of the labeler of data, is ranked up the labeled data of not auditing of selection,
From not auditing in labeled data for selection, selection confidence value, which is located in third threshold interval, does not audit labeled data.
Optionally, as shown in figure 4, described first determines that subelement 311 includes:
Third determining module 3111 is tied for obtaining the auditing result data for having audited labeled data based on the audit
Fruit data determine the confidence value of each labeled data for not auditing labeled data concentration;Wherein,
It is described do not audit minor sort headed by labeled data collection when, the auditing result data are to have set the history of quantity
Audit the auditing result data of labeled data;
It is described do not audit labeled data collection be it is non-sort for the first time when, the auditing result data be that setting quantity has been examined
The auditing result data of core labeled data and the auditing result data of the collected auditor.
Optionally, as shown in figure 4, the third determining module 3111, does not audit labeled data, base for being directed to each
The confidence value for not auditing labeled data is determined in the information that the auditing result data include;
Wherein, the auditing result data include following information: not auditing the history mark row of the labeler of labeled data
For information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
Optionally, as shown in figure 4, the third determining module 3111 includes:
First computational submodule 31111 marks behavior for the history based on the labeler for not auditing labeled data
The content that information includes calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information includes following content: the labeler for not auditing labeled data,
The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in audit labeled data.
Optionally, as shown in figure 4, first computational submodule 31111, described unexamined for being calculated by formula (2)
The confidence value of core labeled data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not
The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable
The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described
Quantity;The a characterization is greater than 0 constant;The b characterization is greater than 0 constant.
Optionally, as shown in figure 4, first computational submodule 31111, described unexamined for being calculated by formula (3)
The confidence value of core labeled data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not
The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable
The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described
Quantity;The e characterization is greater than 0 constant;The f characterization is greater than 0 constant;The g characterization is greater than 0 constant.
Optionally, as shown in figure 4, the third determining module 3111 includes:
Second computational submodule 31112, for not audited based on the labeler for not auditing labeled data for described
The history for the content and the labeler for not auditing labeled data that the mark behavioural information of labeled data includes marks behavior
The content that information includes calculates the confidence value for not auditing labeled data;Wherein,
The labeler for the mark behavioural information for not auditing labeled data include one in following content or
Multiple: labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the label time point of labeled data
The space-number between labeled data and the last labeled data of its labeler marking error is not audited;
The history mark behavioural information of the labeler for not auditing labeled data includes one or more in following content
It is a: the audit mark for having audited average mark duration, labeler that correct labeled data is marked in labeled data of labeler
The labeled data of marking error corresponding error period, labeler have audited marking error in labeled data in note data
Labeled data occur equispaced number, labeler audited the quantity that correct labeled data is marked in labeled data and
The total amount for having audited labeled data of labeler.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler
The mark durations of data and labeler have audited the average mark duration that correct labeled data is marked in labeled data, lead to
It crosses formula (4) and calculates the confidence value for not auditing labeled data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization
The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler
I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1
Number.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler
The labeled data for having audited marking error in the labeled data corresponding error time of the label time point and labeler of data
Section calculates the confidence value for not auditing labeled data by formula (5);
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization
The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table
Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often
Number, and m2 is greater than m1.
Optionally, as shown in figure 4, second computational submodule 31112, for being marked with it based on not auditing labeled data
It is marked in the labeled data of audit of space-number and labeler between the last labeled data of note person's marking error wrong
The equispaced number that labeled data accidentally occurs calculates the confidence value for not auditing labeled data by formula (6);
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The PijCharacterize j-th it is unexamined
Space-number between core labeled data and the last labeled data of its labeler i marking error;The QiCharacterize labeler i
Audited marking error in labeled data labeled data occur equispaced number;K1 and k2 characterizes constant, and k1 is big
In k2.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler
The label time points of data, labeler the labeled data for having audited marking error in the labeled data corresponding error period,
Space-number between labeled data and the last labeled data of its labeler marking error and labeler are not audited
The equispaced number that the labeled data of marking error in labeled data occurs is audited, does not audit mark by the way that formula (7) calculating is described
Infuse the confidence value of data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization
The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table
Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often
Number, and m2 is greater than m1;The PijCharacterize j-th of last mark for not auditing labeled data Yu its labeler i marking error
Space-number between data;The QiWhat the labeled data for having audited marking error in labeled data of characterization labeler i occurred
Equispaced number;K1 and k2 characterizes constant, and k1 is greater than k2.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler
The mark durations of data, labeler mark do not audit the label time point of labeled data, in the labeled data of audit of labeler
Mark the average mark duration of correct labeled data, the labeled data pair for having audited marking error in labeled data of labeler
Error period for answering, labeler have audited the quantity that correct labeled data is marked in labeled data and labeler
The total amount for auditing labeled data calculates the confidence value for not auditing labeled data by formula (8);
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization
The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler
I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1
Number;The tijThe mark for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited described in j-th of characterization
Infuse time point;[the t1i, t2i] characterization labeler i the labeled data for having audited marking error in labeled data it is corresponding go out
The wrong period;M1 and m2 characterizes constant, and m2 is greater than m1;The PijIt characterizes j-th and does not audit labeled data and its labeler i
Space-number between the last labeled data of marking error;The QiCharacterize the labeled data of the audit acceptance of the bid of labeler i
The equispaced number that the labeled data of note mistake occurs;K1 and k2 characterizes constant, and k1 is greater than k2;First power of α characterization
Weight;The β characterizes the second weight;The γ characterizes third weight.
Optionally, as shown in figure 4, the third determining module 3111 includes:
Third computational submodule 31113 marks behavior for the history based on the labeler for not auditing labeled data
The content that information includes calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information of the labeler for not auditing labeled data includes following content: labeler
The total amount for having audited labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data.
Optionally, as shown in figure 4, the third computational submodule 31113, for by the audit labeled data of labeler
The quantity of the middle correct labeled data of mark and the ratio of labeler audited between the total amount of labeled data, are determined as institute
State the confidence value for not auditing labeled data.
Optionally, as shown in figure 4, the device further include:
Auxiliary unit 34, the manual examination and verification process for the confidence value auxiliary labeled data based on labeled data.
In the audit driving means of labeled data provided in an embodiment of the present invention, adopted in each functional module operational process
Method detailed annotation may refer to the corresponding method detailed annotation of Fig. 1, Fig. 2 embodiment of the method, and details are not described herein.
Further, according to above-described embodiment, another embodiment of the invention additionally provides a kind of computer-readable deposit
Storage media, the storage medium include the program of storage, wherein control in described program operation and set where the storage medium
It is standby execute it is any one of above-mentioned described in labeled data audit method for pushing.
Further, according to above-described embodiment, another embodiment of the invention additionally provides a kind of storage management apparatus,
Include:
Memory, for storing program;
Processor is coupled to the memory, executed for running described program it is any one of above-mentioned described in mark
Infuse the audit method for pushing of data.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
The embodiment of the invention discloses:
The audit method for pushing of A1, a kind of labeled data, comprising:
Determine the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data, which is concentrated, extracts
The auditor that labeled data is pushed to setting is not audited;And
The auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, update not
The sequence for not auditing labeled data respectively that audit labeled data is concentrated.
The row for not auditing labeled data respectively of labeled data concentration is not audited in A2, method according to a1, the determination
Sequence, comprising:
Determine the confidence value for not auditing each labeled data of labeled data concentration, the confidence value and corresponding mark number
According to mark correct probability it is related;
Based on the confidence value of each labeled data, the sequence for not auditing labeled data respectively is determined.
A3, the method according to A2, the confidence value based on each labeled data determine and do not audit labeled data respectively
Sequence, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark
Data;
The content that audit behavioural information according to the labeler for respectively not auditing labeled data of selection includes, determines the choosing
What is taken does not audit the sequence of labeled data respectively;
Wherein, the audit behavioural information includes one or more of following content: marking just continuously occurs in labeler
Time that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of true labeled data and labeler is reviewed
Number.
A4, the method according to A2, the confidence value based on each labeled data determine and do not audit labeled data respectively
Sequence, comprising:
Using the first formula, the ordering score for not auditing labeled data respectively is determined;
First formula are as follows:
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit mark
The confidence value of data;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization;
Based on the size of the ordering score, the sequence for not auditing labeled data respectively is determined.
A5, the method according to A2, the confidence value based on each labeled data determine and do not audit labeled data respectively
Sequence, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark
Data;
The size of the confidence value of labeled data based on selection determines the row for not auditing labeled data respectively of the selection
Sequence.
A6, the method according to A3 or A5 are based on labeled data in the confidence value for not auditing labeled data respectively
The history of labeler marks behavioural information, or respectively not auditing the confidence value of labeled data is the labeler needle based on labeled data
To the mark behavioural information of the labeled data when, it is described to be concentrated from the labeled data of not auditing, choose confidence value
Labeled data is not audited in preset threshold interval, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in first threshold section, does not audit mark number
According to.
A7, the method according to A3 or A5, when not auditing the corresponding confidence value of labeled data is two, and one
Confidence value is that the history of the labeler based on labeled data marks behavioural information and obtains, another confidence value is based on mark
The labeler of data is for the history of the mark behavioural information of the labeled data and the labeler of labeled data mark behavior letter
It ceases and obtains, it is described to be concentrated from the labeled data of not auditing, it chooses confidence value and is located at not auditing in preset threshold interval
Labeled data, comprising:
It is concentrated from the labeled data that do not audit, the history of the labeler based on labeled data marks behavioural information and obtains
Confidence value, choose confidence value and be located in second threshold section and do not audit labeled data;
Labeler based on labeled data is for the mark behavioural information of the labeled data and the labeler of labeled data
History mark behavioural information obtained by confidence value, the labeled data of not auditing of selection is ranked up, from the unexamined of selection
In core labeled data, selection confidence value, which is located in third threshold interval, does not audit labeled data.
A8, the method according to A2, the confidence level for each labeled data that the labeled data that the determination is not audited is concentrated
Value, comprising:
The auditing result data for having audited labeled data are obtained, do not audit mark based on auditing result data determination is described
Infuse the confidence value of each labeled data in data set;Wherein,
It is described do not audit minor sort headed by labeled data collection when, the auditing result data are to have set the history of quantity
Audit the auditing result data of labeled data;
It is described do not audit labeled data collection be it is non-sort for the first time when, the auditing result data be that setting quantity has been examined
The auditing result data of core labeled data and the auditing result data of the collected auditor.
A9, the method according to A8, it is described not audit labeled data collection based on auditing result data determination is described
In each labeled data confidence value, comprising:
Labeled data is not audited for each, is not audited described in the information determination for including based on the auditing result data
The confidence value of labeled data;
Wherein, the auditing result data include following information: not auditing the history mark row of the labeler of labeled data
For information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
A10, the method according to A9, the information determination for including based on the auditing result data is described not to be audited
The confidence value of labeled data, comprising:
The content that history mark behavioural information based on the labeler for not auditing labeled data includes, calculating are described not
Audit the confidence value of labeled data;
Wherein, the history mark behavioural information includes following content: the labeler for not auditing labeled data,
The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in audit labeled data.
A11, the method according to A10, the history based on the labeler for not auditing labeled data mark row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The confidence value for not auditing labeled data is calculated by the second formula;
Second formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not
The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable
The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described
Quantity;The a characterization is greater than 0 constant;The b characterization is greater than 0 constant.
A12, the method according to A10, the history based on the labeler for not auditing labeled data mark row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The confidence value for not auditing labeled data is calculated by third formula;
The third formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not
The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable
The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described
Quantity;The e characterization is greater than 0 constant;The f characterization is greater than 0 constant;The g characterization is greater than 0 constant.
A13, the method according to A9, the information determination for including based on the auditing result data is described not to be audited
The confidence value of labeled data, comprising:
Based on the labeler for not auditing labeled data for the mark behavioural information packet for not auditing labeled data
The history mark behavioural information content that includes of the content and the labeler for not auditing labeled data that include, described in calculating
The confidence value of labeled data is not audited;Wherein,
The labeler for the mark behavioural information for not auditing labeled data include one in following content or
Multiple: labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the label time point of labeled data
The space-number between labeled data and the last labeled data of its labeler marking error is not audited;
The history mark behavioural information of the labeler for not auditing labeled data includes one or more in following content
It is a: the audit mark for having audited average mark duration, labeler that correct labeled data is marked in labeled data of labeler
The labeled data of marking error corresponding error period, labeler have audited marking error in labeled data in note data
Labeled data occur equispaced number, labeler audited the quantity that correct labeled data is marked in labeled data and
The total amount for having audited labeled data of labeler.
A14, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined
The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The labeled data of the audit acceptance of the bid of the mark duration and labeler of labeled data is not audited based on labeler mark
The average mark duration for infusing correct labeled data calculates the confidence value for not auditing labeled data by the 4th formula;
4th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization
The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler
I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1
Number.
A15, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined
The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
It is not audited in the label time point of labeled data and the labeled data of audit of labeler based on labeler mark
The labeled data of the marking error corresponding error period calculates the confidence level for not auditing labeled data by the 5th formula
Value;
5th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization
The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table
Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often
Number, and m2 is greater than m1.
A16, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined
The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
Based on the space-number that do not audit between labeled data and the last labeled data of its labeler marking error, with
And the equispaced number that the labeled data for having audited marking error in labeled data of labeler occurs, it is calculated by the 6th formula
The confidence value for not auditing labeled data;
6th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The PijCharacterize j-th it is unexamined
Space-number between core labeled data and the last labeled data of its labeler i marking error;The QiCharacterize labeler i
Audited marking error in labeled data labeled data occur equispaced number;K1 and k2 characterizes constant, and k1 is big
In k2.
A17, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined
The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
It is marked based on labeler and is marked in the labeled data of audit of the label time point, labeler of not auditing labeled data
The labeled data of mistake corresponding error period, the last mark for not auditing labeled data Yu its labeler marking error
The equispaced that the labeled data for having audited marking error in labeled data of space-number and labeler between data occurs
Number calculates the confidence value for not auditing labeled data by the 7th formula;
7th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization
The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table
Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often
Number, and m2 is greater than m1;The PijCharacterize j-th of last mark for not auditing labeled data Yu its labeler i marking error
Space-number between data;The QiWhat the labeled data for having audited marking error in labeled data of characterization labeler i occurred
Equispaced number;K1 and k2 characterizes constant, and k1 is greater than k2.
A18, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined
The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The mark duration of labeled data is not audited based on labeler mark, labeler mark does not audit the mark of labeled data
Time point, labeler have audited having examined for the average mark duration, labeler that correct labeled data is marked in labeled data
In core labeled data the labeled data of marking error it is corresponding error the period, labeler the labeled data of audit in mark just
The quantity of true labeled data and the total amount for having audited labeled data of labeler are not audited by the way that the calculating of the 8th formula is described
The confidence value of labeled data;
8th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization
The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler
I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1
Number;The tijThe mark for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited described in j-th of characterization
Infuse time point;[the t1i, t2i] characterization labeler i the labeled data for having audited marking error in labeled data it is corresponding go out
The wrong period;M1 and m2 characterizes constant, and m2 is greater than m1;The PijIt characterizes j-th and does not audit labeled data and its labeler i
Space-number between the last labeled data of marking error;The QiCharacterize the labeled data of the audit acceptance of the bid of labeler i
The equispaced number that the labeled data of note mistake occurs;K1 and k2 characterizes constant, and k1 is greater than k2;First power of α characterization
Weight;The β characterizes the second weight;The γ characterizes third weight.
A19, the method according to A9, the information determination for including based on the auditing result data is described not to be audited
The confidence value of labeled data, comprising:
The content that history mark behavioural information based on the labeler for not auditing labeled data includes, calculating are described not
Audit the confidence value of labeled data;
Wherein, the history mark behavioural information of the labeler for not auditing labeled data includes following content: labeler
The total amount for having audited labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data.
A20, the method according to A19, the history based on the labeler for not auditing labeled data mark row
For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
Having audited for labeler is marked into the audit of the quantity and labeler of correct labeled data in labeled data
Ratio between the total amount of labeled data is determined as the confidence value for not auditing labeled data.
A21, the method according to any in A1-A5, A8-A20, this method further include: the confidence based on labeled data
The manual examination and verification process of angle value auxiliary labeled data.
The audit driving means of B1, a kind of labeled data, comprising:
Determination unit, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit, for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit is marked
It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting;And
Updating unit is tied according to certain frequency based on collected audit for collecting the auditing result data of auditor
Fruit data update the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
B2, the device according to B1, the determination unit include:
First determines subelement, the confidence value of each labeled data for determining the labeled data that do not audit concentration, institute
It is related to the correct probability of the mark of corresponding labeled data to state confidence value;
Second determines subelement, for the confidence value based on each labeled data, determines the row for not auditing labeled data respectively
Sequence.
B3, the device according to B2, described second determines that subelement includes:
First chooses module, for concentrating from the labeled data of not auditing, chooses confidence value and is located at preset threshold value
Labeled data is not audited in section;
First determining module, the audit behavioural information for the labeler for respectively not auditing labeled data according to selection include
Content, determine the selection does not audit the sequence of labeled data respectively;
Wherein, the audit behavioural information includes one or more of following content: marking just continuously occurs in labeler
Time that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of true labeled data and labeler is reviewed
Number.
B4, the device according to B2, the determination unit include:
Third determines subelement, for utilizing the first formula, determines the ordering score for not auditing labeled data respectively;Based on institute
The size of ordering score is stated, determines the sequence for not auditing labeled data respectively;
First formula are as follows:
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit mark
The confidence value of data;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization.
B5, the device according to B2, described second determines that subelement includes:
Second chooses module, for concentrating from the labeled data of not auditing, chooses confidence value and is located at preset threshold value
Labeled data is not audited in section;
Second determining module, the size of the confidence value for the labeled data based on selection determine each of the selection
The sequence of labeled data is not audited.
B6, the device according to B3 or B5 are based on labeled data in the confidence value for not auditing labeled data respectively
The history of labeler marks behavioural information, or respectively not auditing the confidence value of labeled data is the labeler needle based on labeled data
To the mark behavioural information of the labeled data when,
The selection module chooses confidence value and is located at first threshold area for concentrating from the labeled data of not auditing
Interior does not audit labeled data.
B7, the device according to B3 or B5, when not auditing the corresponding confidence value of labeled data is two, and one
Confidence value is that the history of the labeler based on labeled data marks behavioural information and obtains, another confidence value is based on mark
The labeler of data is for the history of the mark behavioural information of the labeled data and the labeler of labeled data mark behavior letter
It ceases and obtains,
The selection module, for concentrating from the labeled data that do not audit, the labeler based on labeled data is gone through
History marks confidence value obtained by behavioural information, and selection confidence value, which is located in second threshold section, does not audit labeled data;
History mark of the labeler based on labeled data for the mark behavioural information of the labeled data and the labeler of labeled data
Confidence value obtained by behavioural information is infused, the labeled data of not auditing of selection is ranked up, does not audit mark number from selection
In, selection confidence value, which is located in third threshold interval, does not audit labeled data.
B8, the device according to B2, described first determines that subelement includes:
Third determining module is based on the auditing result number for obtaining the auditing result data for having audited labeled data
According to the confidence value for determining each labeled data for not auditing labeled data concentration;Wherein,
It is described do not audit minor sort headed by labeled data collection when, the auditing result data are to have set the history of quantity
Audit the auditing result data of labeled data;
It is described do not audit labeled data collection be it is non-sort for the first time when, the auditing result data be that setting quantity has been examined
The auditing result data of core labeled data and the auditing result data of the collected auditor.
B9, the device according to B8, the third determining module, for not auditing labeled data, base for each
The confidence value for not auditing labeled data is determined in the information that the auditing result data include;
Wherein, the auditing result data include following information: not auditing the history mark row of the labeler of labeled data
For information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
B10, the device according to B9, the third determining module include:
First computational submodule marks behavioural information packet for the history based on the labeler for not auditing labeled data
The content included calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information includes following content: the labeler for not auditing labeled data,
The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in audit labeled data.
B11, device according to b10, first computational submodule are described unexamined for being calculated by the second formula
The confidence value of core labeled data;
Second formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not
The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable
The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described
Quantity;The a characterization is greater than 0 constant;The b characterization is greater than 0 constant.
B12, device according to b10, first computational submodule are described unexamined for being calculated by third formula
The confidence value of core labeled data;
The third formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not
The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable
The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described
Quantity;The e characterization is greater than 0 constant;The f characterization is greater than 0 constant;The g characterization is greater than 0 constant.
B13, the device according to B9, the third determining module include:
Second computational submodule, for not auditing mark number for described based on the labeler for not auditing labeled data
According to the history of the mark behavioural information content and the labeler for not auditing labeled data that include mark behavioural information packet
The content included calculates the confidence value for not auditing labeled data;Wherein,
The labeler for the mark behavioural information for not auditing labeled data include one in following content or
Multiple: labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the label time point of labeled data
The space-number between labeled data and the last labeled data of its labeler marking error is not audited;
The history mark behavioural information of the labeler for not auditing labeled data includes one or more in following content
It is a: the audit mark for having audited average mark duration, labeler that correct labeled data is marked in labeled data of labeler
The labeled data of marking error corresponding error period, labeler have audited marking error in labeled data in note data
Labeled data occur equispaced number, labeler audited the quantity that correct labeled data is marked in labeled data and
The total amount for having audited labeled data of labeler.
B14, device according to b13, second computational submodule, for not auditing mark based on labeler mark
The mark durations of data and labeler have audited the average mark duration that correct labeled data is marked in labeled data, lead to
It crosses the 4th formula and calculates the confidence value for not auditing labeled data;
4th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization
The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler
I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1
Number.
B15, device according to b13, second computational submodule, for not auditing mark based on labeler mark
The labeled data for having audited marking error in the labeled data corresponding error time of the label time point and labeler of data
Section calculates the confidence value for not auditing labeled data by the 5th formula;
5th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization
The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table
Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often
Number, and m2 is greater than m1.
B16, device according to b13, second computational submodule, for being marked with it based on not auditing labeled data
It is marked in the labeled data of audit of space-number and labeler between the last labeled data of note person's marking error wrong
The equispaced number that labeled data accidentally occurs calculates the confidence value for not auditing labeled data by the 6th formula;
6th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The PijCharacterize j-th it is unexamined
Space-number between core labeled data and the last labeled data of its labeler i marking error;The QiCharacterize labeler i
Audited marking error in labeled data labeled data occur equispaced number;K1 and k2 characterizes constant, and k1 is big
In k2.
B17, device according to b13, second computational submodule, for not auditing mark based on labeler mark
The label time points of data, labeler the labeled data for having audited marking error in the labeled data corresponding error period,
Space-number between labeled data and the last labeled data of its labeler marking error and labeler are not audited
The equispaced number that the labeled data of marking error in labeled data occurs is audited, does not audit mark by the way that the calculating of the 7th formula is described
Infuse the confidence value of data;
7th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization
The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table
Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often
Number, and m2 is greater than m1;The PijCharacterize j-th of last mark for not auditing labeled data Yu its labeler i marking error
Space-number between data;The QiWhat the labeled data for having audited marking error in labeled data of characterization labeler i occurred
Equispaced number;K1 and k2 characterizes constant, and k1 is greater than k2.
B18, device according to b13, second computational submodule, for not auditing mark based on labeler mark
The mark durations of data, labeler mark do not audit the label time point of labeled data, in the labeled data of audit of labeler
Mark the average mark duration of correct labeled data, the labeled data pair for having audited marking error in labeled data of labeler
Error period for answering, labeler have audited the quantity that correct labeled data is marked in labeled data and labeler
The total amount for auditing labeled data calculates the confidence value for not auditing labeled data by the 8th formula;
8th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization
The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler
I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1
Number;The tijThe mark for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited described in j-th of characterization
Infuse time point;[the t1i, t2i] characterization labeler i the labeled data for having audited marking error in labeled data it is corresponding go out
The wrong period;M1 and m2 characterizes constant, and m2 is greater than m1;The PijIt characterizes j-th and does not audit labeled data and its labeler i
Space-number between the last labeled data of marking error;The QiCharacterize the labeled data of the audit acceptance of the bid of labeler i
The equispaced number that the labeled data of note mistake occurs;K1 and k2 characterizes constant, and k1 is greater than k2;First power of α characterization
Weight;The β characterizes the second weight;The γ characterizes third weight.
B19, the device according to B9, the third determining module include:
Third computational submodule marks behavioural information packet for the history based on the labeler for not auditing labeled data
The content included calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information of the labeler for not auditing labeled data includes following content: labeler
The total amount for having audited labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data.
B20, the device according to B19, the third computational submodule, for by the audit labeled data of labeler
The quantity of the middle correct labeled data of mark and the ratio of labeler audited between the total amount of labeled data, are determined as institute
State the confidence value for not auditing labeled data.
B21, the device according to any in B1-B5, B8-B20, the device further include:
Auxiliary unit, the manual examination and verification process for the confidence value auxiliary labeled data based on labeled data.
C1, a kind of computer readable storage medium, the storage medium include the program of storage, wherein in described program
Equipment where controlling the storage medium when operation executes the audit push side of labeled data described in any one of A1 to A21
Method.
D1, a kind of storage management apparatus, comprising:
Memory, for storing program;
Processor is coupled to the memory, for running described program to execute described in any one of A1 to A21
The audit method for pushing of labeled data.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment
" first ", " second " etc. be and not represent the superiority and inferiority of each embodiment for distinguishing each embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself
All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors
Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice
Microprocessor or digital signal processor (DSP) realize the operation of deep neural network model according to an embodiment of the present invention
The some or all functions of some or all components in method, apparatus and frame.The present invention is also implemented as being used for
Some or all device or device programs of method as described herein are executed (for example, computer program and calculating
Machine program product).It is such to realize that program of the invention can store on a computer-readable medium, or can have one
Or the form of multiple signals.Such signal can be downloaded from an internet website to obtain, or be provided on the carrier signal,
Or it is provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch
To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame
Claim.
Claims (10)
1. a kind of audit method for pushing of labeled data characterized by comprising
Determine the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data concentration is extracted unexamined
Core labeled data is pushed to the auditor of setting;And
The auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, and update is not audited
The sequence for not auditing labeled data respectively that labeled data is concentrated.
2. the method according to claim 1, wherein not auditing respectively for labeled data concentration is not audited in the determination
The sequence of labeled data, comprising:
Determine the confidence value for not auditing each labeled data of labeled data concentration, the confidence value and corresponding labeled data
The correct probability of mark is related;
Based on the confidence value of each labeled data, the sequence for not auditing labeled data respectively is determined.
3. according to the method described in claim 2, it is characterized in that, the confidence value based on each labeled data, determines each
The sequence of labeled data is not audited, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark number
According to;
The content that audit behavioural information according to the labeler for respectively not auditing labeled data of selection includes, determines the selection
The sequence of labeled data is not audited respectively;
Wherein, the audit behavioural information includes one or more of following content: it is correct that mark continuously occurs in labeler
The number that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of labeled data and labeler is reviewed.
4. according to the method described in claim 2, it is characterized in that, the confidence value based on each labeled data, determines each
The sequence of labeled data is not audited, comprising:
Using the first formula, the ordering score for not auditing labeled data respectively is determined;
First formula are as follows:
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit labeled data
Confidence value;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization;
Based on the size of the ordering score, the sequence for not auditing labeled data respectively is determined.
5. according to the method described in claim 2, it is characterized in that, the confidence value based on each labeled data, determines each
The sequence of labeled data is not audited, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark number
According to;
The size of the confidence value of labeled data based on selection determines the sequence for not auditing labeled data respectively of the selection.
6. the method according to claim 3 or 5, which is characterized in that in the confidence value for not auditing labeled data be respectively base
Behavioural information is marked in the history of the labeler of labeled data, or not auditing the confidence value of labeled data respectively is based on mark number
According to labeler for the labeled data mark behavioural information and when, it is described from it is described do not audit labeled data concentrate,
Selection confidence value, which is located in preset threshold interval, does not audit labeled data, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in first threshold section, does not audit labeled data.
7. the method according to claim 3 or 5, which is characterized in that be not auditing the corresponding confidence value of labeled data
At two, and a confidence value be the labeler based on labeled data history mark behavioural information and obtain, another confidence
Angle value is the labeler based on labeled data for the mark behavioural information of the labeled data and the labeler of labeled data
History marks behavioural information and obtains, described to concentrate from the labeled data of not auditing, chooses confidence value and is located at preset threshold value
Labeled data is not audited in section, comprising:
It concentrates from the labeled data that do not audit, is set obtained by the history mark behavioural information of the labeler based on labeled data
Certainty value, selection confidence value, which is located in second threshold section, does not audit labeled data;
Labeler based on labeled data is gone through for the mark behavioural information of the labeled data and the labeler of labeled data
History marks confidence value obtained by behavioural information, is ranked up to the labeled data of not auditing of selection, does not audit mark from selection
It infuses in data, selection confidence value, which is located in third threshold interval, does not audit labeled data.
8. a kind of audit driving means of labeled data characterized by comprising
Determination unit, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit, for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit marks number
The auditor for not auditing labeled data and being pushed to setting is extracted according to concentration;And
Updating unit is based on collected auditing result number according to certain frequency for collecting the auditing result data of auditor
According to the sequence for not auditing labeled data respectively of labeled data concentration is not audited in update.
9. a kind of computer readable storage medium, which is characterized in that the storage medium includes the program of storage, wherein in institute
Equipment perform claim where controlling the storage medium when stating program operation requires 1 to described in any one of claim 7
The audit method for pushing of labeled data.
10. a kind of storage management apparatus characterized by comprising
Memory, for storing program;
Processor is coupled to the memory, any into claim 7 with perform claim requirement 1 for running described program
The audit method for pushing of labeled data described in one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458916.4A CN110222244B (en) | 2019-05-29 | 2019-05-29 | Method and device for auditing and pushing labeled data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458916.4A CN110222244B (en) | 2019-05-29 | 2019-05-29 | Method and device for auditing and pushing labeled data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110222244A true CN110222244A (en) | 2019-09-10 |
CN110222244B CN110222244B (en) | 2022-03-01 |
Family
ID=67818900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910458916.4A Active CN110222244B (en) | 2019-05-29 | 2019-05-29 | Method and device for auditing and pushing labeled data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110222244B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270533A (en) * | 2020-11-12 | 2021-01-26 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968806A (en) * | 2010-10-22 | 2011-02-09 | 天津南大通用数据技术有限公司 | Data storage method, querying method and device |
CN102117302A (en) * | 2009-12-31 | 2011-07-06 | 南京理工大学 | Data origin tracking method on sensor data stream complex query results |
CN106485528A (en) * | 2015-09-01 | 2017-03-08 | 阿里巴巴集团控股有限公司 | The method and apparatus of detection data |
US10009358B1 (en) * | 2014-02-11 | 2018-06-26 | DataVisor Inc. | Graph based framework for detecting malicious or compromised accounts |
-
2019
- 2019-05-29 CN CN201910458916.4A patent/CN110222244B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117302A (en) * | 2009-12-31 | 2011-07-06 | 南京理工大学 | Data origin tracking method on sensor data stream complex query results |
CN101968806A (en) * | 2010-10-22 | 2011-02-09 | 天津南大通用数据技术有限公司 | Data storage method, querying method and device |
US10009358B1 (en) * | 2014-02-11 | 2018-06-26 | DataVisor Inc. | Graph based framework for detecting malicious or compromised accounts |
CN106485528A (en) * | 2015-09-01 | 2017-03-08 | 阿里巴巴集团控股有限公司 | The method and apparatus of detection data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270533A (en) * | 2020-11-12 | 2021-01-26 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110222244B (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103810030B (en) | It is a kind of based on the application recommendation method of mobile terminal application market, apparatus and system | |
CN109522556A (en) | A kind of intension recognizing method and device | |
CN110232060A (en) | A kind of checking method and device of labeled data | |
US20190371438A1 (en) | Computer-implemented system and method of facilitating artificial intelligence based revenue cycle management in healthcare | |
CN106156092B (en) | Data processing method and device | |
US9122995B2 (en) | Classification of stream-based data using machine learning | |
CN106503006A (en) | The sort method and device of application App neutron applications | |
WO2004061740B1 (en) | A surveying apparatus and method for compensation reports | |
CN110162566A (en) | Association analysis method, device, computer equipment and the storage medium of business datum | |
CN109102332A (en) | A kind of method, apparatus and electronic equipment of data processing | |
CN108509461A (en) | A kind of sequence learning method and server based on intensified learning | |
CN110263818A (en) | Method, apparatus, terminal and the computer readable storage medium of resume selection | |
CN110659985A (en) | Method and device for fishing back false rejection potential user and electronic equipment | |
CN109886778A (en) | The recommended method and system of the tie-in sale product of air ticket | |
CN115145812B (en) | Test case generation method and device, electronic equipment and storage medium | |
CN107590195A (en) | Textual classification model training method, file classification method and its device | |
CN108256970A (en) | A kind of method that Products Show is carried out based on shopping need | |
CN108090503A (en) | On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer | |
CN107679884A (en) | Method, apparatus, computer equipment and the storage medium that group's premium is assessed | |
CN110458600A (en) | Portrait model training method, device, computer equipment and storage medium | |
CN108805580A (en) | Account number analysis method, device and storage medium | |
CN110222244A (en) | A kind of the audit method for pushing and device of labeled data | |
WO2000054186A1 (en) | Financial forecasting system and method for risk assessment and management | |
CN105160003B (en) | A kind of APP retrieval ordering method and system based on geographical location | |
CN111160647B (en) | Money laundering behavior prediction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |