CN110222244A - A kind of the audit method for pushing and device of labeled data - Google Patents

A kind of the audit method for pushing and device of labeled data Download PDF

Info

Publication number
CN110222244A
CN110222244A CN201910458916.4A CN201910458916A CN110222244A CN 110222244 A CN110222244 A CN 110222244A CN 201910458916 A CN201910458916 A CN 201910458916A CN 110222244 A CN110222244 A CN 110222244A
Authority
CN
China
Prior art keywords
labeled data
auditing
labeler
audit
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910458916.4A
Other languages
Chinese (zh)
Other versions
CN110222244B (en
Inventor
陈天伦
王嘉磊
张孝磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201910458916.4A priority Critical patent/CN110222244B/en
Publication of CN110222244A publication Critical patent/CN110222244A/en
Application granted granted Critical
Publication of CN110222244B publication Critical patent/CN110222244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90348Query processing by searching ordered data, e.g. alpha-numerically ordered data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Abstract

The invention discloses the audit method for pushing and device of a kind of labeled data, are related to technical field of data processing, and main purpose is to push the labeled data for more having audit to be worth for audit, to improve labeled data review efficiency;Main technical schemes comprise determining that the sequence for not auditing labeled data respectively for not auditing labeled data concentration;It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data, which is concentrated to extract, does not audit the auditor that labeled data is pushed to setting;And the auditing result data of auditor are collected, collected auditing result data are based on according to certain frequency, update the sequence for not auditing labeled data respectively for not auditing labeled data concentration.

Description

A kind of the audit method for pushing and device of labeled data
Technical field
The present invention relates to technical field of data processing, more particularly to the audit method for pushing and dress of a kind of labeled data It sets.
Background technique
With the arriving of big data era, the data volumes of numerous industries at geometric progression growth.In order to preferably to sea Amount data are utilized, and are usually labeled to data, to allow data preferably to drive production, operation, life etc. each Kind activity.When data application is under the scenes such as machine learning and data mining, in order to keep the labeled data marked more preferable It is more acurrate, usually audited to the labeled data marked.
The labeled data marked is audited currently, generalling use manual examination and verification mode.In manual examination and verification, examine Core person needs to carry out manual examination and verification to labeled data one by one, and whole labeled data be intended to complete after being reviewed one by one it is whole A artificial review process.As it can be seen that this manual examination and verification mode places one's entire reliance upon, the initiative recognition of auditor is completed to audit, and nothing It is required to by the high-quality of labeled data or badly audit in turn, audit blindness is larger, lower so as to cause review efficiency.
Summary of the invention
In view of this, main purpose is to push more the invention proposes a kind of checking method of labeled data and device The labeled data for having audit to be worth is for audit, to improve labeled data review efficiency.
In a first aspect, the present invention provides a kind of audit of labeled data push, this method comprises:
Determine the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data, which is concentrated, extracts The auditor that labeled data is pushed to setting is not audited;And
The auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, update not The sequence for not auditing labeled data respectively that audit labeled data is concentrated.
Second aspect, the present invention provides a kind of audit device of labeled data, which includes:
Determination unit, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit, for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit is marked It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting;And
Updating unit is tied according to certain frequency based on collected audit for collecting the auditing result data of auditor Fruit data update the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
The third aspect, the present invention provides a kind of computer readable storage medium, the storage medium includes the journey of storage Sequence, wherein equipment where controlling the storage medium in described program operation executes described in any one of first aspect The audit method for pushing of labeled data.
Fourth aspect, the present invention provides a kind of storage management apparatus, comprising:
Memory, for storing program;
Processor is coupled to the memory, for running described program to execute described in any one of first aspect Labeled data audit method for pushing.
By above-mentioned technical proposal, the audit method for pushing and device of labeled data provided by the invention, determination are not audited The sequence for not auditing labeled data respectively that labeled data is concentrated, and mark number is not audited based on each of labeled data concentration is not audited According to sequence, never audit labeled data, which is concentrated to extract, does not audit the auditor that labeled data is pushed to setting.It is audited in auditor When complete labeled data, the auditing result data of auditor are collected, are based on collected auditing result data according to certain frequency, more The sequence for not auditing labeled data respectively of labeled data concentration is not audited newly.As it can be seen that foundation can be more when auditing labeled data New labeled data sequence push of not auditing more has the labeled data of audit value to audit for auditor, therefore mark can be improved Data review efficiency.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 shows a kind of flow chart of the audit method for pushing of labeled data provided by one embodiment of the present invention;
Fig. 2 shows a kind of flow charts for auditing method for pushing for labeled data that another embodiment of the present invention provides;
Fig. 3 shows a kind of structural representation of the audit driving means of labeled data provided by one embodiment of the present invention Figure;
Fig. 4 shows a kind of structural representation of the audit driving means of labeled data of another embodiment of the present invention offer Figure.
Specific embodiment
It is described more fully the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although showing this public affairs in attached drawing The exemplary embodiment opened, it being understood, however, that may be realized in various forms the disclosure without the implementation that should be illustrated here Example is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the model of the disclosure It encloses and is fully disclosed to those skilled in the art.
As shown in Figure 1, the embodiment of the invention provides a kind of audit method for pushing of labeled data, this method is mainly wrapped It includes:
101, the sequence for not auditing labeled data respectively for not auditing labeled data concentration is determined.
In practical applications, it does not audit labeled data concentration to have and largely do not audit labeled data, these labeled data It is to be obtained by least one labeler by marking original unlabeled data, original unlabeled data described here may include But at least one of it is not limited to text data, image data, voice data and video data or a variety of.
Specifically, not auditing mark based on the different maintaining methods for not auditing labeled data collection and different audit requirements Note data set at least exists following several:
The first, does not audit labeled data concentration including setting quantity and does not audit labeled data, do not audit labeled data The quantity for not auditing labeled data concentrated is reduced with the manual examination and verification process of labeled data.
Second, labeled data concentration is not audited including setting quantity and does not audit labeled data, does not audit labeled data The quantity of the labeled data of concentration obtains at least one mark by specified interface with the manual examination and verification process of labeled data The corresponding new labeled data of person, which adds to, does not audit labeled data concentration, to maintain not auditing the unexamined of labeled data concentration The quantity of core labeled data is constant.
The third, the quantity for not auditing labeled data for not auditing labeled data concentration without limitation, corresponds to specific At least one labeler, labeled data of the specific labeler within the period of setting, which is collected into, does not audit labeled data In.
4th kind, the quantity for not auditing labeled data for not auditing labeled data concentration without limitation, corresponds to specific At least one labeler pass through specified interface during the manual examination and verification of labeled data and obtain specific labeler phase The new labeled data answered, which adds to, does not audit labeled data concentration, so that new labeled data can also obtain people in time Work audit.
Specifically, determine do not audit labeled data concentration respectively do not audit labeled data sequence process specifically include as Lower step: the confidence value for not auditing each labeled data of labeled data concentration, confidence value and corresponding labeled data are determined The correct probability of mark is related.Based on the confidence value of each labeled data, the sequence for not auditing labeled data respectively is determined.
Confidence value involved in the embodiment of the present invention is related to the correct probability of the mark of corresponding labeled data, also It is that the confidence value of labeled data can reflect the correctness of labeled data, therefore can pass through the confidence level of each labeled data Value is ranked up each labeled data, so that the auditor of labeled data can be according to the confidence value of labeled data preferentially to most It is worth the labeled data of audit to be audited, so that the audit of labeled data is more targeted.The confidence value of labeled data It may include following several for determining method at least:
The first, the confidence value for not auditing each labeled data that labeled data is concentrated is obtained from specified interface, will acquire Each confidence value be determined as the confidence value of each labeled data accordingly.Specified interface described here is connected to for calculating The computing platform of the confidence value of labeled data.When obtaining demand there are confidence value, directly it is by specified interface acquisition Can, since confidence value is directly to obtain by specified interface, it can quickly determine the confidence value of labeled data.
Second, the auditing result data for having audited labeled data are obtained, mark is not audited based on the determination of auditing result data Infuse the confidence value of each labeled data in data set.
It should be noted that auditing result data are setting quantity in minor sort headed by not auditing labeled data collection History has audited the auditing result data of labeled data.Do not audit labeled data collection be it is non-sort for the first time when, in order to unexamined The confidence value of the labeled data of core optimizes, so that confidence value more can reflect the mark of corresponding labeled data just True probability, then auditing result data are to set the auditor of the auditing result data and collection of having audited labeled data of quantity Auditing result data.
Specifically, the auditing result data for having audited labeled data of setting quantity can at least pass through following four kinds of approach It obtains: first is that, the audit labeled data for setting quantity is determined in the database for audited labeled data from being stored with, and extract The determining auditing result data for having audited labeled data.The labeled data of audit of determination described here can for it is unexamined Core labeled data is identical related or similar data, wherein can audit labeled data based on semantic similar principle judgement Whether to not audit labeled data same or similar or related.Second is that being obtained from the specific network platform by web crawlers Belong to the audit labeled data that same type marks task with labeled data is not audited, from the labeled data of audit of acquisition It determines the audit labeled data of setting quantity, and extracts the determining auditing result data for having audited labeled data.Here institute The labeled data of audit for the determination stated can for do not audit that labeled data is identical related or similar data, wherein can be with Based on semantic similar principle judgement audited labeled data whether to not audit labeled data same or similar or related.It needs Illustrate, mark task type can based on initial data pattern (for example, initial data pattern be lteral data, video Data) or the affiliated industry of initial data it is related.Third is that the labeled data never audited, which is concentrated, extracts a certain number of labeled data It is pushed to auditor's audit, auditor is collected for these and is pushed the auditing result data of labeled data.Here described Certain amount is preset quantity, for example 100 or described certain amounts are that labeled data concentrates the hundred of labeled data total amount Divide ratio, for example, labeled data total amount is 1000, then certain amount is the product of 1000 and 10%.Fourth is that in the mark that do not audit When each labeled data in data set needs the auditing result for having audited labeled data concentrated based on labeled data to be updated, The auditing result data for having audited labeled data for then setting quantity include: the audit obtained from database or the network platform The auditing result data of labeled data and the auditor of collection audit the auditing result data for the labeled data that labeled data is concentrated. Such approach can optimize the confidence value for the labeled data that do not audit, so that confidence value more can reflect accordingly The correct probability of the mark of labeled data.
Specifically, auditing result data include following information: not auditing the history mark behavior of the labeler of labeled data Information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
Specifically, determining each labeled data for not auditing labeled data concentration based on the information that auditing result data include Confidence value method include at least it is following several:
The content that method one, the history mark behavioural information based on the labeler for not auditing labeled data include, calculates not Audit the confidence value of labeled data.
Specifically, history mark behavioural information includes following content: not auditing the labeler of labeled data, auditing mark The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in note data.
Method two is directed to based on the labeler for not auditing labeled data and is not audited the mark behavioural information of labeled data and include Content, and do not audit the history mark behavioural information content that includes of the labeler of labeled data, mark is not audited in calculating The confidence value of data.
Specifically, labeler is for one or more that the mark behavioural information for not auditing labeled data includes in following content It is a: labeler mark does not audit the mark duration of labeled data, labeler marks the label time point for not auditing labeled data and The space-number between labeled data and the last labeled data of its labeler marking error is not audited.
Specifically, the history mark behavioural information for not auditing the labeler of labeled data include one in following content or Multiple: labeler has audited the audit that the average mark duration, labeler of correct labeled data are marked in labeled data In labeled data the labeled data of marking error it is corresponding error the period, labeler the labeled data of audit in marking error Labeled data occur equispaced number, labeler audited marked in labeled data the quantity of correct labeled data with And the total amount for having audited labeled data of labeler.
The content that method three, the history mark behavioural information based on the labeler for not auditing labeled data include, calculates institute State the confidence value for not auditing labeled data.
Specifically, the history mark behavioural information for not auditing the labeler of labeled data includes following content: labeler The total amount for having audited labeled data of quantity and labeler that correct labeled data is marked in labeled data is audited.
Method four, method two are combined with method three, determine the confidence value for not auditing labeled data.
Below to the confidence value based on each labeled data, determine that the sequence for not auditing labeled data respectively is illustrated, really The method of the fixed sequence for not auditing labeled data respectively includes at least following several:
The first, never audit labeled data is concentrated, and is chosen confidence value and is located at not auditing in preset threshold interval Labeled data;According to the audit behavioural information content that includes of the labeler for respectively not auditing labeled data of selection, described in determination That chooses does not audit the sequence of labeled data respectively.
Specifically, audit behavioural information includes one or more of following content: it is correct that mark continuously occurs in labeler Quantity, the labeler of labeled data continuously there is the quantity of the labeled data of marking error and labeler is reviewed time Number.
Specifically, there are two types of confidence value includes, the history mark of the labeler one is confidence value based on labeled data Behavioural information is infused, or not auditing the confidence value of labeled data respectively is the labeler based on labeled data for the labeled data Mark behavioural information and obtain.One is confidence value based on the labeler of labeled data for the mark row of the labeled data For the history of information and the labeler of labeled data marks behavioural information and obtains.One labeled data can correspond to upward confidence level One or more of value.Therefore the difference based on the corresponding confidence value of labeled data, never audit labeled data are concentrated, choosing The process for not auditing labeled data for taking confidence value to be located in preset threshold interval includes at least following several:
The first, is the history mark row of the labeler based on labeled data in the confidence value for not auditing labeled data respectively For information, or not auditing the confidence value of labeled data respectively is mark that the labeler based on labeled data is directed to the labeled data When infusing behavioural information and obtaining, never audit labeled data is concentrated, and is chosen confidence value and is located at not auditing in first threshold section Labeled data.Labeled data of the confidence value outside first threshold section mark occurs correctly or the probability of marking error is higher, Therefore be unworthy audit labeled data, and confidence value be located in first threshold section do not audit labeled data its mark Correctness cannot clearly be judged, therefore it is worth to the labeled data of audit the most.
Second, when not auditing the corresponding confidence value of labeled data is two, and a confidence value is based on mark It infuses the history mark behavioural information of the labeler of data and obtains, another confidence value is that the labeler based on labeled data is directed to The history of the labeler of the mark behavioural information and labeled data of the labeled data marks behavioural information and obtains, and never audits Labeled data is concentrated, and confidence value obtained by the history mark behavioural information of the labeler based on labeled data chooses confidence level Value, which is located in second threshold section, does not audit labeled data;Labeler based on labeled data is directed to the mark of the labeled data Confidence value obtained by the history mark behavioural information of the labeler of behavioural information and labeled data is infused, mark is not audited to selection Note data are ranked up, and from not auditing in labeled data for selection, selection confidence value is located at unexamined in third threshold interval Core labeled data.
Specifically, the history of the labeler having the same based on labeled data of the same labeler marks behavioural information Obtained by confidence value, therefore, it is possible to use the labeler based on labeled data history mark behavioural information obtained by confidence Angle value first screens labeler, screens whole labeled data of some labelers, and the confidence value screened out is second The appearance mark of labeled data outside threshold interval is correctly or the probability of marking error is higher, therefore is the mark for being unworthy audit Data, and confidence value be located in second threshold section do not audit labeled data its mark correctness cannot clearly carry out Judgement, therefore extracted, reuse the mark behavior that the labeler based on labeled data is directed to the labeled data Confidence value obtained by the history of information and the labeler of labeled data mark behavioural information is screened.It screens out again Labeled data, and these labeled data may belong to different labelers, and the confidence value screened out is in third threshold zone Between outer labeled data appearance mark is correct or the probability of marking error is higher, therefore be the labeled data for being unworthy audit, And confidence value be located in third threshold interval do not audit labeled data its mark correctness cannot clearly be judged, Therefore it is worth to the labeled data of audit the most.
It is concentrated it should be noted that no matter never auditing labeled data using which kind of mode among the above, chooses confidence level Value, which is located in preset threshold interval, does not audit labeled data, can to threshold interval outside do not audit labeled data do as Lower processing: first is that, choose labeled data there is the labeled data of the Probability maximum of marking error, in order to reduce the work of auditor It measures, this part labeled data will not participate in audit, directly send the labeled data of selection to setting by the corresponding interface Mark personnel are marked again.The mark personnel set described here is the original labelers of these labeled data, or, being System assert the mark higher labeler of correct probability.Second is that, the labeled data chosen occurs marking the mark of correct Probability maximum Infuse data, in order to reduce the workload of auditor, this part labeled data will not participate in audit, determine the labeled data of selection without Manual examination and verification are needed, Direct Mark is that audit passes through.Third is that the labeled data of not auditing outside threshold interval remains in not It audits labeled data to concentrate, when newly getting the auditing result data for having audited labeled data of preset quantity again, based on original Some auditing result data and the auditing result data newly obtained update the confidence value for not auditing labeled data respectively, thus right The confidence value of labeled data optimizes, so that confidence value more can reflect the correct general of the mark of corresponding labeled data Rate.
In order to further determine to be worth the labeled data of audit, is concentrated in never audit labeled data, choose confidence Angle value be located in preset threshold interval do not audit labeled data after, need respectively not audit labeled data according to selection The content that the audit behavioural information of labeler includes determines that chooses does not audit the sequence of labeled data respectively.
Illustratively, audit behavioural information includes that labeler the correct quantity of mark continuously occurs.The quantity is to collect to examine Core person obtains for the auditing result data for the labeled data for not auditing labeled data concentration, continuously marks in labeler When the quantity of correct labeled data is greater than the amount threshold of setting, illustrate the labeler occur marking error probability it is lower, Then to selecting when respectively not auditing labeled data and be ranked up, the labeled data of the labeler is come and is unworthy audit Position (such as tail of the queue) illustrates that these labeled data are unworthy auditing.
Illustratively, audit behavioural information includes note the quantity of the labeled data of marking error continuously occurs in person.The quantity It is to collect auditor to obtain for the auditing result data for the labeled data for not auditing labeled data concentration, it is continuous in labeler When the quantity for the labeled data of marking error occur is greater than the amount threshold of setting, illustrate that the general of marking error occurs in the labeler Rate is higher, then to selecting when respectively not auditing labeled data and be ranked up, the labeled data of the labeler is come worth The position (such as head of the queue) of audit illustrates that these labeled data are worth audit.
Illustratively, audit behavioural information includes the number that labeler is reviewed.The quantity is to collect auditor for not What the auditing result data for the labeled data that audit labeled data is concentrated obtained, it is less than setting in the number that labeler is reviewed When frequency threshold value, it is less to illustrate that the labeled data of the labeler is reviewed number, therefore cannot clearly judge that mark is correct or marks Infuse the probability of mistake.Therefore, the labeled data of the labeler is worth audit, and the labeled data of the labeler is come worth audit Position (such as head of the queue), illustrate these labeled data be worth audit.
Illustratively, audit behavioural information includes note the quantity and labeler of the labeled data of marking error continuously occurs in person The number being reviewed.It is less than the frequency threshold value of setting in the number that labeler is reviewed, and marking error continuously occurs in labeler Labeled data quantity be greater than setting amount threshold, illustrate the labeler occur marking error probability it is higher, then right Select when respectively not auditing labeled data and be ranked up, the labeled data of the labeler is come to the position (ratio for being worth audit Such as head of the queue), illustrate that these labeled data are worth audit.
Second, never audit labeled data is concentrated, and is chosen confidence value and is located at not auditing in preset threshold interval Labeled data;The size of the confidence value of labeled data based on selection, determine the selection does not audit labeled data respectively Sequence.
Specifically, never audit labeled data here is concentrated, chooses confidence value and be located in preset threshold interval The process for not auditing labeled data is discussed essentially identical with the first, therefore will not be described in great detail here.
Specifically, the confidence value obtained by using the history of the labeler based on labeled data mark behavioural information, or Mark number is selected for confidence value obtained by the mark behavioural information of labeled data using the labeler based on labeled data According to when, then according to the history of the labeler based on labeled data mark behavioural information obtained by confidence value, or based on mark number According to labeler for the size of confidence value obtained by the mark behavioural information of labeled data, determine each mark of not auditing chosen The sequence of data is infused, which has symbolized the degree that each labeled data is worth audit.
Specifically, when not auditing the corresponding confidence value of labeled data is two, and a confidence value is based on mark It infuses the history mark behavioural information of the labeler of data and obtains, another confidence value is that the labeler based on labeled data is directed to The labeler of the mark behavioural information and labeled data of the labeled data history mark behavioural information and obtain, using this two respectively When confidence value selects labeled data, then the mark behavior of the labeled data is directed to according to the labeler based on labeled data The history of information and the labeler of labeled data marks the size of confidence value obtained by behavioural information, determines each unexamined of selection The sequence of core labeled data, the sequence have symbolized the degree that each labeled data is worth audit.
The third determines the ordering score for not auditing labeled data respectively using formula (1);Size based on ordering score, Determine the sequence for not auditing labeled data respectively.
Wherein, SjCharacterize j-th of ordering score for not auditing labeled data;MjIt characterizes j-th and does not audit setting for labeled data Certainty value;MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization.
Specifically, ordering score is related with audit necessity degree of corresponding labeled data, can reflect out labeled data is No worth audit.
It should be noted that determining that the ordering score for not auditing labeled data respectively may include before utilizing formula (1) Never audit labeled data is concentrated, and selection confidence value, which is located in preset threshold interval, does not audit labeled data, realization pair The preliminary screening of labeled data, to reduce the audit amount of labeled data.
102, it is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data is concentrated The auditor that labeled data is pushed to setting is not audited in extraction.
Specifically, not auditing labeled data sequence respectively reflects whether labeled data is worth audit, mark is not audited extracting Data-pushing is infused to the auditor of setting, is pushed since the labeled data of most worth audit, the labeled data pushed each time It is the labeled data currently most directly audited in sequence, so that improving examines auditor more targetedly to labeled data Core.
Specifically, when never audit labeled data concentrates extraction not audit labeled data and is pushed to the auditor of setting, It can be corresponding to show labeled data and the corresponding confidence value of labeled data in visual form.
103, the auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, more The sequence for not auditing labeled data respectively of labeled data concentration is not audited newly.
Specifically, auditing result data can be got by specified interface when auditor audits labeled data completion. Reach the quantity of setting having audited labeled data quantity or when current time reaches preset time, then examined based on collected Core result data, update does not audit each of labeled data concentration and does not audit the sequence of labeled data, to set to labeled data Certainty value optimizes, so that confidence value more can reflect the correct probability of the mark of corresponding labeled data.It needs to illustrate , collected auditing result data are based on, the sequence for not auditing labeled data respectively for not auditing labeled data concentration is updated The confidence value of each labeled data concentrated with the unexamined core labeled data of determination among the above of process, based on each labeled data Confidence value determines that the process for the sequence for respectively not auditing labeled data is essentially identical, therefore will not be described in great detail here.
The audit method for pushing of labeled data provided in an embodiment of the present invention, determine do not audit labeled data concentration it is each not The sequence of labeled data is audited, and is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit mark It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting.When auditor has audited labeled data, collect The auditing result data of auditor are based on collected auditing result data according to certain frequency, and labeled data is not audited in update That concentrates does not audit the sequence of labeled data respectively.As it can be seen that not auditing mark number according to what can be updated when auditing labeled data It is audited according to the labeled data that sequence push more has audit to be worth for auditor, therefore labeled data review efficiency can be improved.
Further, method according to figure 1, another embodiment of the invention additionally provide a kind of labeled data Method for pushing is audited, as shown in Fig. 2, the method specifically includes that
201, the confidence value for not auditing each labeled data of labeled data concentration, the confidence value and corresponding mark are determined The correct probability for infusing the mark of data is related.
Specifically, the form of labeled data collection involved in this step and labeled data concentrate setting for a labeled data Reliability value-acquiring method is substantially identical as the detailed annotation in above-mentioned steps 101, therefore will not be described in great detail here.Below to step 101 Detailed annotation in labeled data confidence value-acquiring method in second " obtain and audited the auditing result data of labeled data, The confidence value for not auditing each labeled data of labeled data concentration is determined based on auditing result data " it is illustrated: based on careful Core result data determines the specific steps for not auditing the confidence value of each labeled data of labeled data concentration are as follows: is directed to each Labeled data is not audited, and the information for including based on the auditing result data determines the confidence level for not auditing labeled data Value.It should be noted that auditing result data include following information: not auditing the history mark behavior of the labeler of labeled data Information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.Therefore include according to auditing result data Information is different, and the information for including based on auditing result data determines the implementation of each confidence value for not auditing labeled data extremely Less include following several:
The first, history based on the labeler for the not auditing labeled data mark behavioural information content that includes, calculate not Audit the confidence value of labeled data.Specifically, history mark behavioural information includes following content: not auditing the mark of labeled data Note person marks the quantity of the quantity of correct labeled data and the labeled data of marking error in having audited labeled data.
Specifically, the content that the history mark behavioural information based on the labeler for not auditing labeled data includes, calculates not The method for auditing the confidence value of labeled data includes the following two kinds:
1, the confidence value for not auditing labeled data is calculated by formula (2).
Wherein, MjThe confidence value of labeled data is not audited described in j-th of characterization;AiIt characterizes j-th and does not audit labeled data Labeler i the quantity of correct labeled data is marked in having audited labeled data;BiIt characterizes j-th and does not audit labeled data Labeler i labeled data of marking error in having audited labeled data quantity;A characterization is greater than 0 constant;B characterization is big In 0 constant.
Specifically, the confidence value for the labeled data being calculated by formula (2) be it is related to labeler historical behavior, That is, the labeled data confidence value having the same of same labeler mark.In the way of such to not auditing respectively When labeled data is ranked up, the labeled data of the same labeler can continuous arrangement.Which may determine that by this sequence The most worth audit of the labeled data of labeler is realized and carries out concentrating push and audit to the labeled data of the same labeler.It adopts The confidence value being calculated with formula (2), the confidence value of labeled data the big, illustrates that the mark of labeled data is correctly general Rate is higher.
Specifically, the specific value of constant a and b in formula (2) can specifically business need determine.Illustratively, a and The equal value of b is 1.Determine constant a and b, and the purpose that constant a and b are all larger than 0 is caused in order to avoid there is A and/or B=0 The case where can not determining confidence value generation.
Formula (1) is illustrated with one embodiment below: it is illustrative, the labeler 1 of labeled data 1 is not audited, The quantity that correct labeled data is marked in labeled data has been audited as " 900 " and marking error set quantity " 1000 " Labeled data quantity " 100 ", the equal value of constant a and b be 1, then based on formula (1) determine do not audit setting for labeled data 1 Certainty value are as follows:
2, the confidence value for not auditing labeled data is calculated by formula (3).
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;AiCharacterize j-th of mark for not auditing labeled data Note person i marks the quantity of correct labeled data in having audited labeled data;BiCharacterize j-th of mark for not auditing labeled data The quantity of note person i labeled data of marking error in having audited labeled data;E characterization is greater than 0 constant;F characterization is greater than 0 Constant;G characterization is greater than 0 constant.
Specifically, the confidence value for the labeled data being calculated by formula (3) be it is related to labeler historical behavior, That is, the labeled data confidence value having the same of same labeler mark.In the way of such to not auditing respectively When labeled data is ranked up, the labeled data of the same labeler can continuous arrangement.Which may determine that by this sequence The most worth audit of the labeled data of labeler is realized and carries out concentrating push and audit to the labeled data of the same labeler.It adopts The confidence value being calculated with formula (3), the confidence value of labeled data the big, illustrates that the mark of labeled data is correctly general Rate is higher.
Specifically, the specific value of constant e, f and g in formula (1) can specifically business need determine.Illustratively, E, the equal value of f and g is 1.Determine constant e, f and g, and constant e, f and g be all larger than 0 purpose be in order to avoid occur A=0 and/ Or B=0, lead to not the case where determining confidence value generation.
Formula (2) is illustrated with one embodiment below: it is illustrative, the labeler 2 of labeled data 2 is not audited, The quantity that correct labeled data is marked in labeled data has been audited as " 900 " and marking error set quantity " 1000 " Labeled data quantity " 100 ", the equal value of constant a and b be 1, then based on formula (2) determine do not audit setting for labeled data 2 Certainty value are as follows:
Second, include for the mark behavioural information for not auditing labeled data based on the labeler for not auditing labeled data Content, and do not audit the history mark behavioural information content that includes of the labeler of labeled data, mark is not audited in calculating The confidence value of data;Wherein, labeler be directed to do not audit labeled data mark behavioural information include in following content one It is a or multiple: when labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the mark of labeled data Between put and do not audit the space-number between labeled data and the last labeled data of its labeler marking error.Mark is not audited The history mark behavioural information for infusing the labeler of data includes one or more of following content: the mark of audit of labeler The average mark duration of correct labeled data, the mark for having audited marking error in labeled data of labeler are marked in data Data it is corresponding error the period, labeler audited marking error in labeled data labeled data occur equispaced The audit labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data of number, labeler Total amount.
Specifically, since the labeler for not auditing labeled data includes for the mark behavioural information for not auditing labeled data Content it is different, and not audit the content that the history mark behavioural information of the labeler of labeled data includes different, therefore base It is directed to the content that do not audit the mark behavioural information of labeled data and include in the labeler for not auditing labeled data, and does not audit The content that the history mark behavioural information of the labeler of labeled data includes, calculates the side for not auditing the confidence value of labeled data Method includes following several:
1, it is not audited in the mark duration of labeled data and the labeled data of audit of labeler based on labeler mark The average mark duration for marking correct labeled data calculates the confidence value for not auditing labeled data by formula (4).
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;TijIt characterizes j-th and does not audit labeled data Labeler i marks j-th of mark duration for not auditing labeled data;RiIt characterizes and is marked in the labeled data of audit of labeler i The average mark duration of correct labeled data;N characterization is greater than or equal to 1 constant.
Specifically, using spent by the smaller labeled data for illustrating labeler mark of formula (3) calculated confidence value Duration it is shorter, illustrate that labeler does not pay the duration that it is normally marked, labeler occurs being perfunctory to the probability of the behavior of mark It is higher, so that the probability that marking error occurs in labeled data is higher.Illustrate to mark using formula (4) calculated confidence value is bigger Duration spent by note person mark labeled data is longer, illustrates that labeler is paid duration spent by its normal mark or paid super Its normal spent duration of mark is crossed, the probability that the behavior conscientiously marked occurs in labeler is higher, so that labeled data is marked The probability for infusing mistake is lower.Therefore the confidence value being calculated using formula (4), the confidence value of labeled data the big, says The mark correct probability of bright labeled data is higher.
Specifically, the specific value of the constant n in formula (4) can specifically business need determine.Illustratively, n value It is 1.
Formula (4) is illustrated with one embodiment below: it is illustrative, the labeler 3 of labeled data 3 is not audited, When labeler 3 marks the mark of labeled data 3 a length of " 5 minutes ", mark is correctly marked in the labeled data of audit of labeler 3 When infusing the average mark of data a length of " 4 minutes ", n value is 1.The confidence for not auditing labeled data 3 is then determined based on formula (3) Angle value are as follows:
2, the label time point of labeled data and the audit labeled data of labeler are not audited based on labeler mark The labeled data of the middle marking error corresponding error period calculates the confidence value for not auditing labeled data by formula (5).
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;tijIt characterizes j-th and does not audit labeled data Labeler i marks j-th of label time point for not auditing labeled data;[t1i, t2i] characterization labeler i audit mark number According to the labeled data of the middle marking error corresponding error period;M1 and m2 characterizes constant, and m2 is greater than m1.
Specifically, illustrating that labeler is higher in its marking error rate using formula (5) calculated confidence value is small It is wrong that mark occurs in the labeled data for being labeled to obtain labeled data in period, therefore marking within this period Probability accidentally is higher.Using formula (5) calculated confidence value, big to illustrate labeler not be higher in its marking error rate The probability that marking error occurs in the labeled data for being labeled to obtain labeled data, therefore obtain in period is lower.Therefore The confidence value being calculated using formula (4), the confidence value of labeled data the big, illustrates that the mark of labeled data is correct Probability is higher.
Specifically, the specific value of m1 and m2 in formula (5) can specifically business need determine.It should be noted that In order to distinguish the correct probability of labeled data, then m2 is greater than m1 when setting.Illustratively, m2 value 1, m1 value 0.95.
Formula (5) is illustrated with one embodiment below: it is illustrative, the labeler 4 of labeled data 4 is not audited, The label time point that labeler 4 marks labeled data 4 is " 13:00 ", and labeler 4 has audited marking error in labeled data The labeled data corresponding error period is " [12:00,14:00] ", m2 value 1, and m1 value 0.95 is then true based on formula (3) The fixed confidence value for not auditing labeled data 4 are as follows:
M4=0.95 13:00 ∈ [12:00,14:00]
3, based on the space-number that do not audit between labeled data and the last labeled data of its labeler marking error, And the equispaced number that the labeled data for having audited marking error in labeled data of labeler occurs, pass through the 6th formula meter Calculate the confidence value for not auditing labeled data.
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;PijCharacterization do not audit for j-th labeled data with Space-number between the last labeled data of its labeler i marking error;QiCharacterize the audit labeled data of labeler i The equispaced number that the labeled data of middle marking error occurs;K1 and k2 characterizes constant, and k1 is greater than k2.
Specifically, labeler, when being labeled, with the progress of mark, labeler is marking a certain number of mark numbers According to when, can generate mark fatigue, so as to cause the labeled data of marking error.And it marks fatigue strength and marking error can be used Labeled data between be averaged and space-number occur to characterize.It can reflect mark by the space-number between labeled data The correct probability of the mark of data.Illustrate that labeler is to mark fatigue strength at it using formula (6) calculated confidence value is small The probability that marking error occurs in the lower labeled data for being labeled to obtain labeled data, therefore obtain is higher.Using formula (6) calculated confidence value is big illustrates that labeler is to be labeled to obtain labeled data when its mark fatigue strength is higher, Therefore the probability that marking error occurs in the labeled data obtained is lower.Therefore the confidence value being calculated using formula (6), mark The the confidence value for infusing data the big, illustrates that the mark correct probability of labeled data is higher.
Specifically, the specific value of k1 and k2 in formula (6) can specifically business need determine.It should be noted that In order to distinguish the correct probability of labeled data, then k1 is greater than k2 when setting.Illustratively, k1 value 1, k2 value 0.9.
Formula (6) is illustrated with one embodiment below: it is illustrative, to not audit the labeler of labeled data 5 5, the space-number between labeled data 5 and the last labeled data of its 5 marking error of labeler is " 5 ";Labeler 5 is There is space-number " 100 ", k1 value 1, k2 value 0.9 in being averaged between the labeled data of marking error in audit labeled data. The confidence value for not auditing labeled data 4 is then determined based on formula (5) are as follows:
4, formula (4), formula (5) and any two in formula (6) or it is multiple can be combined based on business need, adopt With any two or multiple modes combined in formula (4), formula (5) and formula (6), labeled data is not audited in calculating Confidence value.
When formula (4) and formula (5) combine, when the confidence value of labeled data is not audited in calculating, using following public affairs Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4) and formula (5) in the formula.ω 1 and ω 2 is pre- If weight, specific value can based on specific business determine.
When formula (4) and formula (6) combine, when the confidence value of labeled data is not audited in calculating, using following public affairs Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4) and formula (6) in the formula.ω 3 and ω 4 is pre- If weight, specific value can based on specific business determine.
When formula (5) and formula (6) combine, when the confidence value of labeled data is not audited in calculating, using following public affairs Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (5) and formula (6) in the formula.ω 5 and ω 6 is pre- If weight, specific value can based on specific business determine.
When formula (4) and formula (5) and formula (6) combine, when the confidence value of labeled data is not audited in calculating, adopt With following formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4), formula (5) and formula (6) in the formula.ω7,ω8 It is preset weight with ω 9, specific value can be determined based on specific business.
5, the labeled data of the audit acceptance of the bid of the label time point of labeled data, labeler is not audited based on labeler mark It infuses the labeled data corresponding error period of mistake, audit the last mark of labeled data Yu its labeler marking error Between what the labeled data for having audited marking error in labeled data of space-number and labeler between note data occurred is averaged Every number, the confidence value for not auditing labeled data is calculated by formula (7);
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;tijMark number is not audited described in j-th of characterization According to labeler i mark and do not audit the label time point of labeled data for j-th;[t1i, t2i] characterization labeler i audit mark Infuse the labeled data corresponding error period of marking error in data;M1 and m2 characterizes constant, and m2 is greater than m1;PijCharacterization J-th of space-number that do not audit between labeled data and the last labeled data of its labeler i marking error;The QiTable The equispaced number that the labeled data for having audited marking error in labeled data of sign labeler i occurs;K1 and k2 is characterized often Number, and k1 is greater than k2.
Specifically, when having reflected that labeler marks to obtain labeled data using formula (7) calculated confidence value, mark The time location of data is infused, which can reflect labeler and mark to obtain the fatigue strength of labeled data, therefore use Formula (7) calculated confidence value can really reflect the correct probability of the mark of labeled data.
Specifically, illustrating that labeler is lower in its mark fatigue strength using formula (7) calculated confidence value is smaller The probability that marking error occurs in the labeled data for being labeled to obtain labeled data, therefore obtain is higher.It is counted using formula (7) The confidence value of calculating is bigger to illustrate that labeler is to be labeled to obtain labeled data when its mark fatigue strength is higher, therefore The probability that marking error occurs in obtained labeled data is lower.Therefore the confidence value being calculated using formula (7) marks number According to confidence value it is more big, illustrate that the mark correct probability of labeled data is higher.
6, formula (4), formula (5), formula (6) and any two in formula (7) or multiple business need can be based on Combine, using in formula (3), formula (4) and formula (5) any two or it is multiple combine by the way of, calculating do not audit The confidence value of labeled data.
Illustratively, it when formula (4) and formula (7) combine, when the confidence value of labeled data is not audited in calculating, adopts With following formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (4) and formula (7) in the formula.ω 10 and ω 11 are Preset weight, specific value can be determined based on specific business.
7, the mark duration of labeled data is not audited based on labeler mark, labeler mark does not audit the mark of labeled data Note time point, labeler have audited the average mark duration that correct labeled data is marked in labeled data, labeler Audit labeled data in marking error labeled data it is corresponding error the period, labeler the labeled data of audit in mark The correct quantity of labeled data and the total amount for having audited labeled data of labeler are calculated described unexamined by formula (8) The confidence value of core labeled data;
Wherein, MjCharacterize j-th of confidence value for not auditing labeled data;TijIt characterizes j-th and does not audit labeled data Labeler i marks j-th of mark duration for not auditing labeled data;RiIt characterizes and is marked in the labeled data of audit of labeler i The average mark duration of correct labeled data;N characterization is greater than or equal to 1 constant;tijIt characterizes j-th and does not audit labeled data Labeler i mark the label time point for not auditing labeled data for j-th;[t1i, t2i] characterization labeler i audit mark The labeled data of the marking error corresponding error period in data;M1 and m2 characterizes constant, and m2 is greater than m1;PijCharacterization the The j space-numbers that do not audit between labeled data and the last labeled data of its labeler i marking error;QiCharacterization mark The equispaced number that the labeled data for having audited marking error in labeled data of person i occurs;K1 and k2 characterizes constant, and k1 Greater than k2;The α characterizes the first weight;The β characterizes the second weight;The γ characterizes third weight.
The mark row for not auditing labeled data is directed to described in above-mentioned 1-7 based on the labeler for not auditing labeled data For the content that information includes, and the content that the history mark behavioural information of the labeler of labeled data includes is not audited, calculate The method for not auditing the confidence value of labeled data, the confidence value for the labeled data being calculated not only with labeler history row For correlation, and it is related for the mark behavior of labeled data with labeler.The labeled data of same labeler mark may have There is different confidence values.It, can when the confidence value obtained in the way of such is determined and respectively do not audit the sequence of labeled data To judge the most worth audit of which current labeled data, so that the audit of labeled data is more targeted.
The third, history based on the labeler for the not auditing labeled data mark behavioural information content that includes, meter Calculate the confidence value for not auditing labeled data.Wherein, the history for not auditing the labeler of labeled data marks behavioural information and includes Following content: the audit mark for having audited quantity and labeler that correct labeled data is marked in labeled data of labeler Infuse the total amount of data.
Specifically, having audited for labeler is marked the quantity and labeler of correct labeled data in labeled data The ratio between the total amount of labeled data has been audited, has been determined as not auditing the confidence value of labeled data.Obtained labeled data Confidence value be related to labeler historical behavior, that is to say, that the labeled data of same labeler mark is having the same Confidence value.When the confidence value obtained in the way of such is ranked up unexamined Nuclear Data, it can be determined which mark The most worth audit of the labeled data of person, realization carry out concentration audit to the labeled data of the same labeler.Labeled data is set The certainty value the big, illustrates that the mark correct probability of labeled data is higher.
It should be noted that having audited for labeler is marked the quantity and mark of correct labeled data in labeled data The ratio of note person audited between the total amount of labeled data, is determined as the confidence value of the labeled data of confidence value to be determined Method can be based on any one or more in above-mentioned formula (4), formula (5), formula (6) and formula (7) Business need combines, and calculates the confidence value for not auditing labeled data.
Illustratively, when being combined with formula (4), when the confidence value of labeled data is not audited in calculating, using as follows Formula:
The the confidence value of the confidence value being calculated using the formula, labeled data the big, illustrates the mark of labeled data It is higher to infuse correct probability.The characterization of variable please be detailed in above-mentioned formula (3) and formula (7) in the formula.ω 13 and ω 14 are Preset weight, specific value can be determined based on specific business.
202, the confidence value based on each labeled data determines the sequence for not auditing labeled data respectively.
Specifically, the confidence value based on each labeled data, determines the process and step for not auditing the sequence of labeled data respectively Detailed annotation in rapid 101 is essentially identical, therefore will not be further discussed here.
203, it is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data is concentrated The auditor that labeled data is pushed to setting is not audited in extraction.
Specifically, the detailed annotation of this step and the detailed annotation in step 103 are essentially identical, therefore will not be further discussed here.
204, the manual examination and verification process of the confidence value auxiliary labeled data based on labeled data.
Specifically, the manual examination and verification process of the confidence value auxiliary labeled data based on labeled data is including at least following several Kind method:
The first, it is in visual form, corresponding to show labeled data and the corresponding confidence value of labeled data.
Specifically, visual pattern involved in such mode, shows labeled data for preset visualization window is corresponding Confidence value corresponding with labeled data.When showing, it is shown according to sequence, so that auditor can arrange according to the displaying Sequence quickly selects the labeled data of most worth audit in current presentation.
Second, the manual examination and verification result of labeled data is compared with respective confidence value, when comparison result meets When preset condition, output indicates the prompt information that auditing result may be wrong.
Specifically, the manual examination and verification result of labeled data includes auditing to pass through and audit not passing through, audit is by illustrating to mark It is correct to infuse data mark, audit, which does not pass through, then illustrates labeled data marking error.Obstructed manual examination and verification result corresponds to different Confidence value section.When labeled data is reviewed completion, by the corresponding confidence value area of the manual examination and verification result of labeled data Between confidence value corresponding with labeled data be compared, to verify and check the auditing result of auditor.It is artificial when judging When not including the corresponding confidence value of labeled data in the corresponding confidence value section of auditing result, illustrate that auditor audits mark The probability of audit error is higher when data, then exporting indicates the prompt information that auditing result may be wrong, to prompt auditor's weight The labeled data is newly audited, to improve audit effect.When judging the corresponding confidence value section Nei Bao of manual examination and verification result When including the corresponding confidence value of labeled data, illustrate that audit is correct when auditor audits labeled data, then the labeled data is audited It finishes.
The third, is by two kinds of above-mentioned combinations, it is, first in visual form, it is corresponding to show labeled data and mark Infuse the corresponding confidence value of data.Then, when auditor has audited labeled data, by the manual examination and verification result of labeled data with Respective confidence value is compared, and when comparison result meets preset condition, output indicates the prompt that auditing result may be wrong Information.
205, the auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, more The sequence for not auditing labeled data respectively of labeled data concentration is not audited newly.
Specifically, the detailed annotation of this step and the detailed annotation in step 104 are essentially identical, therefore will not be further discussed here.
Further, according to above method embodiment, another embodiment of the invention additionally provides a kind of labeled data Audit driving means, as shown in figure 3, described device includes:
Determination unit 31, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit 32 is never audited for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration Labeled data, which is concentrated to extract, does not audit the auditor that labeled data is pushed to setting;And
Updating unit 33 is based on collected audit according to certain frequency for collecting the auditing result data of auditor Result data updates the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
The audit driving means of labeled data provided in an embodiment of the present invention, determine do not audit labeled data concentration it is each not The sequence of labeled data is audited, and is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit mark It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting.When auditor has audited labeled data, collect The auditing result data of auditor are based on collected auditing result data according to certain frequency, and labeled data is not audited in update That concentrates does not audit the sequence of labeled data respectively.As it can be seen that not auditing mark number according to what can be updated when auditing labeled data It is audited according to the labeled data that sequence push more has audit to be worth for auditor, therefore labeled data review efficiency can be improved.
Optionally, as shown in figure 4, the determination unit 31 includes:
First determines subelement 311, the confidence value of each labeled data for determining the labeled data that do not audit concentration, The confidence value is related to the correct probability of the mark of corresponding labeled data;
Second determines subelement 312, for the confidence value based on each labeled data, determines and does not audit labeled data respectively Sequence.
Optionally, as shown in figure 4, described second determines that subelement 312 includes:
First chooses module 3121, for concentrating from the labeled data of not auditing, chooses confidence value positioned at preset Labeled data is not audited in threshold interval;
First determining module 3122, for according to selection respectively do not audit labeled data labeler audit behavioural information Including content, determine the selection does not audit the sequence of labeled data respectively;
Wherein, the audit behavioural information includes one or more of following content: marking just continuously occurs in labeler Time that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of true labeled data and labeler is reviewed Number.
Optionally, as shown in figure 4, the determination unit 31 includes:
Third determines subelement 313, for utilizing formula (1), determines the ordering score for not auditing labeled data respectively;It is based on The size of the ordering score determines the sequence for not auditing labeled data respectively;
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit mark The confidence value of data;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization.
Optionally, as shown in figure 4, described second determines that subelement 312 includes:
Second chooses module 3123, for concentrating from the labeled data of not auditing, chooses confidence value positioned at preset Labeled data is not audited in threshold interval;
Second determining module 3124, the size of the confidence value for the labeled data based on selection determine the selection Do not audit the sequence of labeled data respectively.
Optionally, as shown in figure 4, being the labeler based on labeled data in the confidence value for not auditing labeled data respectively History marks behavioural information, or not auditing the confidence value of labeled data respectively is the labeler based on labeled data for the mark When infusing the mark behavioural information of data and obtaining,
First chooses module 3121 or the second selection module 3123, for concentrating from the labeled data of not auditing, chooses Confidence value, which is located in first threshold section, does not audit labeled data.
Optionally, as shown in figure 4, when not auditing the corresponding confidence value of labeled data is two, and a confidence level Value is that the history of the labeler based on labeled data marks behavioural information and obtains, another confidence value is based on labeled data Labeler is obtained for the history of the mark behavioural information of the labeled data and the labeler of labeled data mark behavioural information,
First chooses module 3121 or the second selection module 3123, for concentrating from the labeled data that do not audit, base The confidence value obtained by the history mark behavioural information of the labeler of labeled data, chooses confidence value and is located at second threshold area Interior does not audit labeled data;Labeler based on labeled data is directed to the mark behavioural information and mark of the labeled data Confidence value obtained by the history mark behavioural information of the labeler of data, is ranked up the labeled data of not auditing of selection, From not auditing in labeled data for selection, selection confidence value, which is located in third threshold interval, does not audit labeled data.
Optionally, as shown in figure 4, described first determines that subelement 311 includes:
Third determining module 3111 is tied for obtaining the auditing result data for having audited labeled data based on the audit Fruit data determine the confidence value of each labeled data for not auditing labeled data concentration;Wherein,
It is described do not audit minor sort headed by labeled data collection when, the auditing result data are to have set the history of quantity Audit the auditing result data of labeled data;
It is described do not audit labeled data collection be it is non-sort for the first time when, the auditing result data be that setting quantity has been examined The auditing result data of core labeled data and the auditing result data of the collected auditor.
Optionally, as shown in figure 4, the third determining module 3111, does not audit labeled data, base for being directed to each The confidence value for not auditing labeled data is determined in the information that the auditing result data include;
Wherein, the auditing result data include following information: not auditing the history mark row of the labeler of labeled data For information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
Optionally, as shown in figure 4, the third determining module 3111 includes:
First computational submodule 31111 marks behavior for the history based on the labeler for not auditing labeled data The content that information includes calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information includes following content: the labeler for not auditing labeled data, The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in audit labeled data.
Optionally, as shown in figure 4, first computational submodule 31111, described unexamined for being calculated by formula (2) The confidence value of core labeled data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described Quantity;The a characterization is greater than 0 constant;The b characterization is greater than 0 constant.
Optionally, as shown in figure 4, first computational submodule 31111, described unexamined for being calculated by formula (3) The confidence value of core labeled data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described Quantity;The e characterization is greater than 0 constant;The f characterization is greater than 0 constant;The g characterization is greater than 0 constant.
Optionally, as shown in figure 4, the third determining module 3111 includes:
Second computational submodule 31112, for not audited based on the labeler for not auditing labeled data for described The history for the content and the labeler for not auditing labeled data that the mark behavioural information of labeled data includes marks behavior The content that information includes calculates the confidence value for not auditing labeled data;Wherein,
The labeler for the mark behavioural information for not auditing labeled data include one in following content or Multiple: labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the label time point of labeled data The space-number between labeled data and the last labeled data of its labeler marking error is not audited;
The history mark behavioural information of the labeler for not auditing labeled data includes one or more in following content It is a: the audit mark for having audited average mark duration, labeler that correct labeled data is marked in labeled data of labeler The labeled data of marking error corresponding error period, labeler have audited marking error in labeled data in note data Labeled data occur equispaced number, labeler audited the quantity that correct labeled data is marked in labeled data and The total amount for having audited labeled data of labeler.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler The mark durations of data and labeler have audited the average mark duration that correct labeled data is marked in labeled data, lead to It crosses formula (4) and calculates the confidence value for not auditing labeled data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1 Number.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler The labeled data for having audited marking error in the labeled data corresponding error time of the label time point and labeler of data Section calculates the confidence value for not auditing labeled data by formula (5);
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often Number, and m2 is greater than m1.
Optionally, as shown in figure 4, second computational submodule 31112, for being marked with it based on not auditing labeled data It is marked in the labeled data of audit of space-number and labeler between the last labeled data of note person's marking error wrong The equispaced number that labeled data accidentally occurs calculates the confidence value for not auditing labeled data by formula (6);
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The PijCharacterize j-th it is unexamined Space-number between core labeled data and the last labeled data of its labeler i marking error;The QiCharacterize labeler i Audited marking error in labeled data labeled data occur equispaced number;K1 and k2 characterizes constant, and k1 is big In k2.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler The label time points of data, labeler the labeled data for having audited marking error in the labeled data corresponding error period, Space-number between labeled data and the last labeled data of its labeler marking error and labeler are not audited The equispaced number that the labeled data of marking error in labeled data occurs is audited, does not audit mark by the way that formula (7) calculating is described Infuse the confidence value of data;
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often Number, and m2 is greater than m1;The PijCharacterize j-th of last mark for not auditing labeled data Yu its labeler i marking error Space-number between data;The QiWhat the labeled data for having audited marking error in labeled data of characterization labeler i occurred Equispaced number;K1 and k2 characterizes constant, and k1 is greater than k2.
Optionally, as shown in figure 4, second computational submodule 31112, does not audit mark for marking based on labeler The mark durations of data, labeler mark do not audit the label time point of labeled data, in the labeled data of audit of labeler Mark the average mark duration of correct labeled data, the labeled data pair for having audited marking error in labeled data of labeler Error period for answering, labeler have audited the quantity that correct labeled data is marked in labeled data and labeler The total amount for auditing labeled data calculates the confidence value for not auditing labeled data by formula (8);
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1 Number;The tijThe mark for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited described in j-th of characterization Infuse time point;[the t1i, t2i] characterization labeler i the labeled data for having audited marking error in labeled data it is corresponding go out The wrong period;M1 and m2 characterizes constant, and m2 is greater than m1;The PijIt characterizes j-th and does not audit labeled data and its labeler i Space-number between the last labeled data of marking error;The QiCharacterize the labeled data of the audit acceptance of the bid of labeler i The equispaced number that the labeled data of note mistake occurs;K1 and k2 characterizes constant, and k1 is greater than k2;First power of α characterization Weight;The β characterizes the second weight;The γ characterizes third weight.
Optionally, as shown in figure 4, the third determining module 3111 includes:
Third computational submodule 31113 marks behavior for the history based on the labeler for not auditing labeled data The content that information includes calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information of the labeler for not auditing labeled data includes following content: labeler The total amount for having audited labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data.
Optionally, as shown in figure 4, the third computational submodule 31113, for by the audit labeled data of labeler The quantity of the middle correct labeled data of mark and the ratio of labeler audited between the total amount of labeled data, are determined as institute State the confidence value for not auditing labeled data.
Optionally, as shown in figure 4, the device further include:
Auxiliary unit 34, the manual examination and verification process for the confidence value auxiliary labeled data based on labeled data.
In the audit driving means of labeled data provided in an embodiment of the present invention, adopted in each functional module operational process Method detailed annotation may refer to the corresponding method detailed annotation of Fig. 1, Fig. 2 embodiment of the method, and details are not described herein.
Further, according to above-described embodiment, another embodiment of the invention additionally provides a kind of computer-readable deposit Storage media, the storage medium include the program of storage, wherein control in described program operation and set where the storage medium It is standby execute it is any one of above-mentioned described in labeled data audit method for pushing.
Further, according to above-described embodiment, another embodiment of the invention additionally provides a kind of storage management apparatus, Include:
Memory, for storing program;
Processor is coupled to the memory, executed for running described program it is any one of above-mentioned described in mark Infuse the audit method for pushing of data.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
The embodiment of the invention discloses:
The audit method for pushing of A1, a kind of labeled data, comprising:
Determine the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data, which is concentrated, extracts The auditor that labeled data is pushed to setting is not audited;And
The auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, update not The sequence for not auditing labeled data respectively that audit labeled data is concentrated.
The row for not auditing labeled data respectively of labeled data concentration is not audited in A2, method according to a1, the determination Sequence, comprising:
Determine the confidence value for not auditing each labeled data of labeled data concentration, the confidence value and corresponding mark number According to mark correct probability it is related;
Based on the confidence value of each labeled data, the sequence for not auditing labeled data respectively is determined.
A3, the method according to A2, the confidence value based on each labeled data determine and do not audit labeled data respectively Sequence, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark Data;
The content that audit behavioural information according to the labeler for respectively not auditing labeled data of selection includes, determines the choosing What is taken does not audit the sequence of labeled data respectively;
Wherein, the audit behavioural information includes one or more of following content: marking just continuously occurs in labeler Time that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of true labeled data and labeler is reviewed Number.
A4, the method according to A2, the confidence value based on each labeled data determine and do not audit labeled data respectively Sequence, comprising:
Using the first formula, the ordering score for not auditing labeled data respectively is determined;
First formula are as follows:
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit mark The confidence value of data;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization;
Based on the size of the ordering score, the sequence for not auditing labeled data respectively is determined.
A5, the method according to A2, the confidence value based on each labeled data determine and do not audit labeled data respectively Sequence, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark Data;
The size of the confidence value of labeled data based on selection determines the row for not auditing labeled data respectively of the selection Sequence.
A6, the method according to A3 or A5 are based on labeled data in the confidence value for not auditing labeled data respectively The history of labeler marks behavioural information, or respectively not auditing the confidence value of labeled data is the labeler needle based on labeled data To the mark behavioural information of the labeled data when, it is described to be concentrated from the labeled data of not auditing, choose confidence value Labeled data is not audited in preset threshold interval, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in first threshold section, does not audit mark number According to.
A7, the method according to A3 or A5, when not auditing the corresponding confidence value of labeled data is two, and one Confidence value is that the history of the labeler based on labeled data marks behavioural information and obtains, another confidence value is based on mark The labeler of data is for the history of the mark behavioural information of the labeled data and the labeler of labeled data mark behavior letter It ceases and obtains, it is described to be concentrated from the labeled data of not auditing, it chooses confidence value and is located at not auditing in preset threshold interval Labeled data, comprising:
It is concentrated from the labeled data that do not audit, the history of the labeler based on labeled data marks behavioural information and obtains Confidence value, choose confidence value and be located in second threshold section and do not audit labeled data;
Labeler based on labeled data is for the mark behavioural information of the labeled data and the labeler of labeled data History mark behavioural information obtained by confidence value, the labeled data of not auditing of selection is ranked up, from the unexamined of selection In core labeled data, selection confidence value, which is located in third threshold interval, does not audit labeled data.
A8, the method according to A2, the confidence level for each labeled data that the labeled data that the determination is not audited is concentrated Value, comprising:
The auditing result data for having audited labeled data are obtained, do not audit mark based on auditing result data determination is described Infuse the confidence value of each labeled data in data set;Wherein,
It is described do not audit minor sort headed by labeled data collection when, the auditing result data are to have set the history of quantity Audit the auditing result data of labeled data;
It is described do not audit labeled data collection be it is non-sort for the first time when, the auditing result data be that setting quantity has been examined The auditing result data of core labeled data and the auditing result data of the collected auditor.
A9, the method according to A8, it is described not audit labeled data collection based on auditing result data determination is described In each labeled data confidence value, comprising:
Labeled data is not audited for each, is not audited described in the information determination for including based on the auditing result data The confidence value of labeled data;
Wherein, the auditing result data include following information: not auditing the history mark row of the labeler of labeled data For information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
A10, the method according to A9, the information determination for including based on the auditing result data is described not to be audited The confidence value of labeled data, comprising:
The content that history mark behavioural information based on the labeler for not auditing labeled data includes, calculating are described not Audit the confidence value of labeled data;
Wherein, the history mark behavioural information includes following content: the labeler for not auditing labeled data, The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in audit labeled data.
A11, the method according to A10, the history based on the labeler for not auditing labeled data mark row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The confidence value for not auditing labeled data is calculated by the second formula;
Second formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described Quantity;The a characterization is greater than 0 constant;The b characterization is greater than 0 constant.
A12, the method according to A10, the history based on the labeler for not auditing labeled data mark row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The confidence value for not auditing labeled data is calculated by third formula;
The third formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described Quantity;The e characterization is greater than 0 constant;The f characterization is greater than 0 constant;The g characterization is greater than 0 constant.
A13, the method according to A9, the information determination for including based on the auditing result data is described not to be audited The confidence value of labeled data, comprising:
Based on the labeler for not auditing labeled data for the mark behavioural information packet for not auditing labeled data The history mark behavioural information content that includes of the content and the labeler for not auditing labeled data that include, described in calculating The confidence value of labeled data is not audited;Wherein,
The labeler for the mark behavioural information for not auditing labeled data include one in following content or Multiple: labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the label time point of labeled data The space-number between labeled data and the last labeled data of its labeler marking error is not audited;
The history mark behavioural information of the labeler for not auditing labeled data includes one or more in following content It is a: the audit mark for having audited average mark duration, labeler that correct labeled data is marked in labeled data of labeler The labeled data of marking error corresponding error period, labeler have audited marking error in labeled data in note data Labeled data occur equispaced number, labeler audited the quantity that correct labeled data is marked in labeled data and The total amount for having audited labeled data of labeler.
A14, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The labeled data of the audit acceptance of the bid of the mark duration and labeler of labeled data is not audited based on labeler mark The average mark duration for infusing correct labeled data calculates the confidence value for not auditing labeled data by the 4th formula;
4th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1 Number.
A15, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
It is not audited in the label time point of labeled data and the labeled data of audit of labeler based on labeler mark The labeled data of the marking error corresponding error period calculates the confidence level for not auditing labeled data by the 5th formula Value;
5th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often Number, and m2 is greater than m1.
A16, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
Based on the space-number that do not audit between labeled data and the last labeled data of its labeler marking error, with And the equispaced number that the labeled data for having audited marking error in labeled data of labeler occurs, it is calculated by the 6th formula The confidence value for not auditing labeled data;
6th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The PijCharacterize j-th it is unexamined Space-number between core labeled data and the last labeled data of its labeler i marking error;The QiCharacterize labeler i Audited marking error in labeled data labeled data occur equispaced number;K1 and k2 characterizes constant, and k1 is big In k2.
A17, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
It is marked based on labeler and is marked in the labeled data of audit of the label time point, labeler of not auditing labeled data The labeled data of mistake corresponding error period, the last mark for not auditing labeled data Yu its labeler marking error The equispaced that the labeled data for having audited marking error in labeled data of space-number and labeler between data occurs Number calculates the confidence value for not auditing labeled data by the 7th formula;
7th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often Number, and m2 is greater than m1;The PijCharacterize j-th of last mark for not auditing labeled data Yu its labeler i marking error Space-number between data;The QiWhat the labeled data for having audited marking error in labeled data of characterization labeler i occurred Equispaced number;K1 and k2 characterizes constant, and k1 is greater than k2.
A18, the method according to A13, it is described based on the labeler for not auditing labeled data for described unexamined The history for the content and the labeler for not auditing labeled data that the mark behavioural information of core labeled data includes marks row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
The mark duration of labeled data is not audited based on labeler mark, labeler mark does not audit the mark of labeled data Time point, labeler have audited having examined for the average mark duration, labeler that correct labeled data is marked in labeled data In core labeled data the labeled data of marking error it is corresponding error the period, labeler the labeled data of audit in mark just The quantity of true labeled data and the total amount for having audited labeled data of labeler are not audited by the way that the calculating of the 8th formula is described The confidence value of labeled data;
8th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1 Number;The tijThe mark for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited described in j-th of characterization Infuse time point;[the t1i, t2i] characterization labeler i the labeled data for having audited marking error in labeled data it is corresponding go out The wrong period;M1 and m2 characterizes constant, and m2 is greater than m1;The PijIt characterizes j-th and does not audit labeled data and its labeler i Space-number between the last labeled data of marking error;The QiCharacterize the labeled data of the audit acceptance of the bid of labeler i The equispaced number that the labeled data of note mistake occurs;K1 and k2 characterizes constant, and k1 is greater than k2;First power of α characterization Weight;The β characterizes the second weight;The γ characterizes third weight.
A19, the method according to A9, the information determination for including based on the auditing result data is described not to be audited The confidence value of labeled data, comprising:
The content that history mark behavioural information based on the labeler for not auditing labeled data includes, calculating are described not Audit the confidence value of labeled data;
Wherein, the history mark behavioural information of the labeler for not auditing labeled data includes following content: labeler The total amount for having audited labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data.
A20, the method according to A19, the history based on the labeler for not auditing labeled data mark row For the content that information includes, the confidence value for not auditing labeled data is calculated, comprising:
Having audited for labeler is marked into the audit of the quantity and labeler of correct labeled data in labeled data Ratio between the total amount of labeled data is determined as the confidence value for not auditing labeled data.
A21, the method according to any in A1-A5, A8-A20, this method further include: the confidence based on labeled data The manual examination and verification process of angle value auxiliary labeled data.
The audit driving means of B1, a kind of labeled data, comprising:
Determination unit, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit, for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit is marked It is extracted in note data set and does not audit the auditor that labeled data is pushed to setting;And
Updating unit is tied according to certain frequency based on collected audit for collecting the auditing result data of auditor Fruit data update the sequence for not auditing labeled data respectively for not auditing labeled data concentration.
B2, the device according to B1, the determination unit include:
First determines subelement, the confidence value of each labeled data for determining the labeled data that do not audit concentration, institute It is related to the correct probability of the mark of corresponding labeled data to state confidence value;
Second determines subelement, for the confidence value based on each labeled data, determines the row for not auditing labeled data respectively Sequence.
B3, the device according to B2, described second determines that subelement includes:
First chooses module, for concentrating from the labeled data of not auditing, chooses confidence value and is located at preset threshold value Labeled data is not audited in section;
First determining module, the audit behavioural information for the labeler for respectively not auditing labeled data according to selection include Content, determine the selection does not audit the sequence of labeled data respectively;
Wherein, the audit behavioural information includes one or more of following content: marking just continuously occurs in labeler Time that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of true labeled data and labeler is reviewed Number.
B4, the device according to B2, the determination unit include:
Third determines subelement, for utilizing the first formula, determines the ordering score for not auditing labeled data respectively;Based on institute The size of ordering score is stated, determines the sequence for not auditing labeled data respectively;
First formula are as follows:
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit mark The confidence value of data;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization.
B5, the device according to B2, described second determines that subelement includes:
Second chooses module, for concentrating from the labeled data of not auditing, chooses confidence value and is located at preset threshold value Labeled data is not audited in section;
Second determining module, the size of the confidence value for the labeled data based on selection determine each of the selection The sequence of labeled data is not audited.
B6, the device according to B3 or B5 are based on labeled data in the confidence value for not auditing labeled data respectively The history of labeler marks behavioural information, or respectively not auditing the confidence value of labeled data is the labeler needle based on labeled data To the mark behavioural information of the labeled data when,
The selection module chooses confidence value and is located at first threshold area for concentrating from the labeled data of not auditing Interior does not audit labeled data.
B7, the device according to B3 or B5, when not auditing the corresponding confidence value of labeled data is two, and one Confidence value is that the history of the labeler based on labeled data marks behavioural information and obtains, another confidence value is based on mark The labeler of data is for the history of the mark behavioural information of the labeled data and the labeler of labeled data mark behavior letter It ceases and obtains,
The selection module, for concentrating from the labeled data that do not audit, the labeler based on labeled data is gone through History marks confidence value obtained by behavioural information, and selection confidence value, which is located in second threshold section, does not audit labeled data; History mark of the labeler based on labeled data for the mark behavioural information of the labeled data and the labeler of labeled data Confidence value obtained by behavioural information is infused, the labeled data of not auditing of selection is ranked up, does not audit mark number from selection In, selection confidence value, which is located in third threshold interval, does not audit labeled data.
B8, the device according to B2, described first determines that subelement includes:
Third determining module is based on the auditing result number for obtaining the auditing result data for having audited labeled data According to the confidence value for determining each labeled data for not auditing labeled data concentration;Wherein,
It is described do not audit minor sort headed by labeled data collection when, the auditing result data are to have set the history of quantity Audit the auditing result data of labeled data;
It is described do not audit labeled data collection be it is non-sort for the first time when, the auditing result data be that setting quantity has been examined The auditing result data of core labeled data and the auditing result data of the collected auditor.
B9, the device according to B8, the third determining module, for not auditing labeled data, base for each The confidence value for not auditing labeled data is determined in the information that the auditing result data include;
Wherein, the auditing result data include following information: not auditing the history mark row of the labeler of labeled data For information, and/or, labeler is directed to the mark behavioural information for not auditing labeled data.
B10, the device according to B9, the third determining module include:
First computational submodule marks behavioural information packet for the history based on the labeler for not auditing labeled data The content included calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information includes following content: the labeler for not auditing labeled data, The quantity of the quantity of correct labeled data and the labeled data of marking error is marked in audit labeled data.
B11, device according to b10, first computational submodule are described unexamined for being calculated by the second formula The confidence value of core labeled data;
Second formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described Quantity;The a characterization is greater than 0 constant;The b characterization is greater than 0 constant.
B12, device according to b10, first computational submodule are described unexamined for being calculated by third formula The confidence value of core labeled data;
The third formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The AiDescribed in j-th of characterization not The labeler i for auditing labeled data marks the quantity of correct labeled data in described audited in labeled data;The BiTable The labeler i for not auditing labeled data described in j-th of sign has audited the labeled data of marking error in labeled data described Quantity;The e characterization is greater than 0 constant;The f characterization is greater than 0 constant;The g characterization is greater than 0 constant.
B13, the device according to B9, the third determining module include:
Second computational submodule, for not auditing mark number for described based on the labeler for not auditing labeled data According to the history of the mark behavioural information content and the labeler for not auditing labeled data that include mark behavioural information packet The content included calculates the confidence value for not auditing labeled data;Wherein,
The labeler for the mark behavioural information for not auditing labeled data include one in following content or Multiple: labeler mark does not audit the mark duration of labeled data, labeler mark does not audit the label time point of labeled data The space-number between labeled data and the last labeled data of its labeler marking error is not audited;
The history mark behavioural information of the labeler for not auditing labeled data includes one or more in following content It is a: the audit mark for having audited average mark duration, labeler that correct labeled data is marked in labeled data of labeler The labeled data of marking error corresponding error period, labeler have audited marking error in labeled data in note data Labeled data occur equispaced number, labeler audited the quantity that correct labeled data is marked in labeled data and The total amount for having audited labeled data of labeler.
B14, device according to b13, second computational submodule, for not auditing mark based on labeler mark The mark durations of data and labeler have audited the average mark duration that correct labeled data is marked in labeled data, lead to It crosses the 4th formula and calculates the confidence value for not auditing labeled data;
4th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1 Number.
B15, device according to b13, second computational submodule, for not auditing mark based on labeler mark The labeled data for having audited marking error in the labeled data corresponding error time of the label time point and labeler of data Section calculates the confidence value for not auditing labeled data by the 5th formula;
5th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often Number, and m2 is greater than m1.
B16, device according to b13, second computational submodule, for being marked with it based on not auditing labeled data It is marked in the labeled data of audit of space-number and labeler between the last labeled data of note person's marking error wrong The equispaced number that labeled data accidentally occurs calculates the confidence value for not auditing labeled data by the 6th formula;
6th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The PijCharacterize j-th it is unexamined Space-number between core labeled data and the last labeled data of its labeler i marking error;The QiCharacterize labeler i Audited marking error in labeled data labeled data occur equispaced number;K1 and k2 characterizes constant, and k1 is big In k2.
B17, device according to b13, second computational submodule, for not auditing mark based on labeler mark The label time points of data, labeler the labeled data for having audited marking error in the labeled data corresponding error period, Space-number between labeled data and the last labeled data of its labeler marking error and labeler are not audited The equispaced number that the labeled data of marking error in labeled data occurs is audited, does not audit mark by the way that the calculating of the 7th formula is described Infuse the confidence value of data;
7th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The tijDescribed in j-th of characterization The label time point for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;[the t1i, t2i] table Levy the labeled data for having audited marking error in the labeled data corresponding error period of labeler i;M1 and m2 is characterized often Number, and m2 is greater than m1;The PijCharacterize j-th of last mark for not auditing labeled data Yu its labeler i marking error Space-number between data;The QiWhat the labeled data for having audited marking error in labeled data of characterization labeler i occurred Equispaced number;K1 and k2 characterizes constant, and k1 is greater than k2.
B18, device according to b13, second computational submodule, for not auditing mark based on labeler mark The mark durations of data, labeler mark do not audit the label time point of labeled data, in the labeled data of audit of labeler Mark the average mark duration of correct labeled data, the labeled data pair for having audited marking error in labeled data of labeler Error period for answering, labeler have audited the quantity that correct labeled data is marked in labeled data and labeler The total amount for auditing labeled data calculates the confidence value for not auditing labeled data by the 8th formula;
8th formula are as follows:
Wherein, the MjThe confidence value of labeled data is not audited described in j-th of characterization;The TijDescribed in j-th of characterization The mark duration for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited;The RiCharacterize labeler I's has audited the average mark duration that correct labeled data is marked in labeled data;The n characterization is normal more than or equal to 1 Number;The tijThe mark for not auditing labeled data described in j-th of labeler i mark of labeled data is not audited described in j-th of characterization Infuse time point;[the t1i, t2i] characterization labeler i the labeled data for having audited marking error in labeled data it is corresponding go out The wrong period;M1 and m2 characterizes constant, and m2 is greater than m1;The PijIt characterizes j-th and does not audit labeled data and its labeler i Space-number between the last labeled data of marking error;The QiCharacterize the labeled data of the audit acceptance of the bid of labeler i The equispaced number that the labeled data of note mistake occurs;K1 and k2 characterizes constant, and k1 is greater than k2;First power of α characterization Weight;The β characterizes the second weight;The γ characterizes third weight.
B19, the device according to B9, the third determining module include:
Third computational submodule marks behavioural information packet for the history based on the labeler for not auditing labeled data The content included calculates the confidence value for not auditing labeled data;
Wherein, the history mark behavioural information of the labeler for not auditing labeled data includes following content: labeler The total amount for having audited labeled data for having audited quantity and labeler that correct labeled data is marked in labeled data.
B20, the device according to B19, the third computational submodule, for by the audit labeled data of labeler The quantity of the middle correct labeled data of mark and the ratio of labeler audited between the total amount of labeled data, are determined as institute State the confidence value for not auditing labeled data.
B21, the device according to any in B1-B5, B8-B20, the device further include:
Auxiliary unit, the manual examination and verification process for the confidence value auxiliary labeled data based on labeled data.
C1, a kind of computer readable storage medium, the storage medium include the program of storage, wherein in described program Equipment where controlling the storage medium when operation executes the audit push side of labeled data described in any one of A1 to A21 Method.
D1, a kind of storage management apparatus, comprising:
Memory, for storing program;
Processor is coupled to the memory, for running described program to execute described in any one of A1 to A21 The audit method for pushing of labeled data.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment " first ", " second " etc. be and not represent the superiority and inferiority of each embodiment for distinguishing each embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize the operation of deep neural network model according to an embodiment of the present invention The some or all functions of some or all components in method, apparatus and frame.The present invention is also implemented as being used for Some or all device or device programs of method as described herein are executed (for example, computer program and calculating Machine program product).It is such to realize that program of the invention can store on a computer-readable medium, or can have one Or the form of multiple signals.Such signal can be downloaded from an internet website to obtain, or be provided on the carrier signal, Or it is provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims (10)

1. a kind of audit method for pushing of labeled data characterized by comprising
Determine the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
It is sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit labeled data concentration is extracted unexamined Core labeled data is pushed to the auditor of setting;And
The auditing result data for collecting auditor are based on collected auditing result data according to certain frequency, and update is not audited The sequence for not auditing labeled data respectively that labeled data is concentrated.
2. the method according to claim 1, wherein not auditing respectively for labeled data concentration is not audited in the determination The sequence of labeled data, comprising:
Determine the confidence value for not auditing each labeled data of labeled data concentration, the confidence value and corresponding labeled data The correct probability of mark is related;
Based on the confidence value of each labeled data, the sequence for not auditing labeled data respectively is determined.
3. according to the method described in claim 2, it is characterized in that, the confidence value based on each labeled data, determines each The sequence of labeled data is not audited, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark number According to;
The content that audit behavioural information according to the labeler for respectively not auditing labeled data of selection includes, determines the selection The sequence of labeled data is not audited respectively;
Wherein, the audit behavioural information includes one or more of following content: it is correct that mark continuously occurs in labeler The number that the quantity of the labeled data of marking error continuously occur in quantity, the labeler of labeled data and labeler is reviewed.
4. according to the method described in claim 2, it is characterized in that, the confidence value based on each labeled data, determines each The sequence of labeled data is not audited, comprising:
Using the first formula, the ordering score for not auditing labeled data respectively is determined;
First formula are as follows:
Wherein, the SjCharacterize j-th of ordering score for not auditing labeled data;The MjIt characterizes j-th and does not audit labeled data Confidence value;The MinLabeled data is not audited in each n-th for not auditing labeler i in labeled data of characterization;
Based on the size of the ordering score, the sequence for not auditing labeled data respectively is determined.
5. according to the method described in claim 2, it is characterized in that, the confidence value based on each labeled data, determines each The sequence of labeled data is not audited, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in preset threshold interval, does not audit mark number According to;
The size of the confidence value of labeled data based on selection determines the sequence for not auditing labeled data respectively of the selection.
6. the method according to claim 3 or 5, which is characterized in that in the confidence value for not auditing labeled data be respectively base Behavioural information is marked in the history of the labeler of labeled data, or not auditing the confidence value of labeled data respectively is based on mark number According to labeler for the labeled data mark behavioural information and when, it is described from it is described do not audit labeled data concentrate, Selection confidence value, which is located in preset threshold interval, does not audit labeled data, comprising:
It is concentrated from the labeled data of not auditing, selection confidence value, which is located in first threshold section, does not audit labeled data.
7. the method according to claim 3 or 5, which is characterized in that be not auditing the corresponding confidence value of labeled data At two, and a confidence value be the labeler based on labeled data history mark behavioural information and obtain, another confidence Angle value is the labeler based on labeled data for the mark behavioural information of the labeled data and the labeler of labeled data History marks behavioural information and obtains, described to concentrate from the labeled data of not auditing, chooses confidence value and is located at preset threshold value Labeled data is not audited in section, comprising:
It concentrates from the labeled data that do not audit, is set obtained by the history mark behavioural information of the labeler based on labeled data Certainty value, selection confidence value, which is located in second threshold section, does not audit labeled data;
Labeler based on labeled data is gone through for the mark behavioural information of the labeled data and the labeler of labeled data History marks confidence value obtained by behavioural information, is ranked up to the labeled data of not auditing of selection, does not audit mark from selection It infuses in data, selection confidence value, which is located in third threshold interval, does not audit labeled data.
8. a kind of audit driving means of labeled data characterized by comprising
Determination unit, for determining the sequence for not auditing labeled data respectively for not auditing labeled data concentration;
Push unit, for being sorted based on the labeled data of not auditing respectively for not auditing labeled data concentration, never audit marks number The auditor for not auditing labeled data and being pushed to setting is extracted according to concentration;And
Updating unit is based on collected auditing result number according to certain frequency for collecting the auditing result data of auditor According to the sequence for not auditing labeled data respectively of labeled data concentration is not audited in update.
9. a kind of computer readable storage medium, which is characterized in that the storage medium includes the program of storage, wherein in institute Equipment perform claim where controlling the storage medium when stating program operation requires 1 to described in any one of claim 7 The audit method for pushing of labeled data.
10. a kind of storage management apparatus characterized by comprising
Memory, for storing program;
Processor is coupled to the memory, any into claim 7 with perform claim requirement 1 for running described program The audit method for pushing of labeled data described in one.
CN201910458916.4A 2019-05-29 2019-05-29 Method and device for auditing and pushing labeled data Active CN110222244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910458916.4A CN110222244B (en) 2019-05-29 2019-05-29 Method and device for auditing and pushing labeled data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910458916.4A CN110222244B (en) 2019-05-29 2019-05-29 Method and device for auditing and pushing labeled data

Publications (2)

Publication Number Publication Date
CN110222244A true CN110222244A (en) 2019-09-10
CN110222244B CN110222244B (en) 2022-03-01

Family

ID=67818900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910458916.4A Active CN110222244B (en) 2019-05-29 2019-05-29 Method and device for auditing and pushing labeled data

Country Status (1)

Country Link
CN (1) CN110222244B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270533A (en) * 2020-11-12 2021-01-26 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101968806A (en) * 2010-10-22 2011-02-09 天津南大通用数据技术有限公司 Data storage method, querying method and device
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN106485528A (en) * 2015-09-01 2017-03-08 阿里巴巴集团控股有限公司 The method and apparatus of detection data
US10009358B1 (en) * 2014-02-11 2018-06-26 DataVisor Inc. Graph based framework for detecting malicious or compromised accounts

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN101968806A (en) * 2010-10-22 2011-02-09 天津南大通用数据技术有限公司 Data storage method, querying method and device
US10009358B1 (en) * 2014-02-11 2018-06-26 DataVisor Inc. Graph based framework for detecting malicious or compromised accounts
CN106485528A (en) * 2015-09-01 2017-03-08 阿里巴巴集团控股有限公司 The method and apparatus of detection data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270533A (en) * 2020-11-12 2021-01-26 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110222244B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN103810030B (en) It is a kind of based on the application recommendation method of mobile terminal application market, apparatus and system
CN109522556A (en) A kind of intension recognizing method and device
CN110232060A (en) A kind of checking method and device of labeled data
US20190371438A1 (en) Computer-implemented system and method of facilitating artificial intelligence based revenue cycle management in healthcare
CN106156092B (en) Data processing method and device
US9122995B2 (en) Classification of stream-based data using machine learning
CN106503006A (en) The sort method and device of application App neutron applications
WO2004061740B1 (en) A surveying apparatus and method for compensation reports
CN110162566A (en) Association analysis method, device, computer equipment and the storage medium of business datum
CN109102332A (en) A kind of method, apparatus and electronic equipment of data processing
CN108509461A (en) A kind of sequence learning method and server based on intensified learning
CN110263818A (en) Method, apparatus, terminal and the computer readable storage medium of resume selection
CN110659985A (en) Method and device for fishing back false rejection potential user and electronic equipment
CN109886778A (en) The recommended method and system of the tie-in sale product of air ticket
CN115145812B (en) Test case generation method and device, electronic equipment and storage medium
CN107590195A (en) Textual classification model training method, file classification method and its device
CN108256970A (en) A kind of method that Products Show is carried out based on shopping need
CN108090503A (en) On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer
CN107679884A (en) Method, apparatus, computer equipment and the storage medium that group's premium is assessed
CN110458600A (en) Portrait model training method, device, computer equipment and storage medium
CN108805580A (en) Account number analysis method, device and storage medium
CN110222244A (en) A kind of the audit method for pushing and device of labeled data
WO2000054186A1 (en) Financial forecasting system and method for risk assessment and management
CN105160003B (en) A kind of APP retrieval ordering method and system based on geographical location
CN111160647B (en) Money laundering behavior prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant