CN108665158A

CN108665158A - A kind of method, apparatus and equipment of trained air control model

Info

Publication number: CN108665158A
Application number: CN201810431886.3A
Authority: CN
Inventors: 陆毅成; 陈弢; 韦晓倩; 杨维嘉; 程羽
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2018-10-16

Abstract

This specification embodiment discloses a kind of method, apparatus and equipment of trained air control model.It is that mark collects with the existing sample that marked, to need the sample to be marked being marked for collection to be marked in the prior art in this illustrates embodiment.Include limited is concentrated to mark sample according to mark first, training obtains an air control model, then starts iteration and updates the air control model.In each iteration, collected according to mark, taking out several samples to be marked from concentration to be marked is labeled, and with update mark collection, is then collected according to updated mark, re -training air control model, completes an iteration.

Description

A kind of method, apparatus and equipment of trained air control model

Technical field

This specification is related to information technology field more particularly to a kind of method, apparatus and equipment of trained air control model.

Background technology

It is self-evident in the importance of all conglomeraties, risk control (abbreviation air control) work.For example, electronic payment platform Air control work can be identified from the transferred account service accepted be accused of arbitrage (such as paying party using credit card to beneficiary turn Account, beneficiary gathering after, to paying party return cash, to help paying party to get cash by trickery using credit card) transferred account service.

In general, technical staff can tend to use traditional supervised learning algorithm (such as classification tree, random forest Deng) training air control model, for carrying out risk identification to every business.In upper example, several history service data can be obtained For sample to be marked, " 1 " is marked to the known sample to be marked for being accused of arbitrage, to the known sample mark to be marked for not being accused of arbitrage Note " 0 ", be then input with the obtained sample of mark, training obtains air control model, for the transferred account service to accepting whether It is accused of arbitrage to be identified.It should be noted that it is generally necessary to obtain the mark sample of magnanimity, it could train and obtain to risk Identify accurate air control model.

Based on the prior art, a kind of method of the lower trained air control model of cost is needed.

Invention content

This specification embodiment provides a kind of method, apparatus and equipment of trained air control model, to solve existing training The excessively high problem of cost present in the method for air control model.

In order to solve the above technical problems, what this specification embodiment was realized in：

A kind of method for trained air control model that this specification embodiment provides, mark are concentrated comprising sample has been marked, are waited for It includes sample to be marked that mark, which is concentrated, the method includes：

Collected according to the mark, training air control model；

Based on the air control model that training obtains, the sample to be marked for including to the concentration to be marked carries out uncertain point Analysis；

Several samples to be marked are taken out from the concentration to be marked according to analysis result, and not by the concentration to be marked The sample to be marked being removed is as subset to be marked；

It is supplied to mark side to be labeled in the sample to be marked of taking-up, and receives the mark sample that mark side returns；

The sample that at least partly marked that the mark side returns is added to the mark concentration；

The mark sample for including is concentrated according to the mark, several samples to be marked are taken out from the subset to be marked It is labeled, and the sample of mark that mark obtains is added to the mark and is concentrated；

Collected according to the mark, re -training air control model, until meeting specified requirements.

A kind of device for trained air control model that this specification embodiment provides, mark are concentrated comprising sample has been marked, are waited for It includes sample to be marked that mark, which is concentrated, and described device includes：

Training module collects according to the mark, training air control model, and according to the updated mark of fourth processing module Collection, re -training air control model, until meeting specified requirements；

Analysis module, based on the air control model that training obtains, the sample to be marked for including to the concentration to be marked carries out Analysis of uncertainty；

First processing module takes out several samples to be marked according to analysis result from the concentration to be marked, and by institute It states and to be marked concentrates the sample to be marked that is not removed as subset to be marked；

The sample to be marked of taking-up is supplied to mark side to be labeled by Second processing module, and receives the return of mark side Mark sample；

The sample that at least partly marked that the mark side returns is added to the mark concentration by third processing module；

The fourth processing module concentrates the mark sample for including, from the subset to be marked according to the mark It takes out several samples to be marked to be labeled, and the sample of mark that mark obtains is added to the mark and is concentrated.

A kind of equipment for trained air control model that this specification embodiment provides, mark are concentrated comprising sample has been marked, are waited for It includes sample to be marked that mark, which is concentrated, and the memory has program stored therein, and is configured to by one or more of processing Device executes following steps：

The technical solution provided by above this specification embodiment as it can be seen that in this specification embodiment, with it is existing It is that mark collects to mark sample, to need the sample to be marked being marked for collection to be marked in the prior art, is collected according to mark, repeatedly Generation training air control model.In an iteration, following steps are executed：Based on current air control model, (current air control model is Got according to mark training in upper primary iteration), concentrate the sample to be marked for including to carry out uncertainty to be marked Analysis；According to analysis result several samples to be marked, and the sample to be marked that will be removed are taken out from the concentration to be marked As subset to be marked；It is supplied to mark side to be labeled in the sample to be marked of taking-up, and receives the mark that mark side returns Note sample；The sample that at least partly marked that mark side returns is added to the mark concentration；It is concentrated and is wrapped according to the mark The mark sample contained takes out several samples to be marked from the subset to be marked and is labeled, and mark is obtained Mark sample is added to the mark and concentrates；Collected according to the mark, re -training air control model.Then, start to change next time Generation.When meeting specified requirements, iteration can be stopped.In this way, in each iteration, can include using mark concentration be limited Mark sample, take out several samples to be marked from concentration to be marked and be labeled, with update mark collection, and according to update after Mark collection re -training air control model.When meeting specified requirements, stop iteration, also just no longer needs to take out from concentration to be marked Sample is labeled.Led to for the air control model for training risk identification accuracy to meet the requirements by this specification embodiment Often without being all labeled to all samples to be marked of the concentration to be marked, this is also achieved that with smaller cost, instruction Practise the air control model that risk identification accuracy is met the requirements.

Description of the drawings

In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only Some embodiments described in this specification, for those of ordinary skill in the art, in not making the creative labor property Under the premise of, other drawings may also be obtained based on these drawings.

Fig. 1 is a kind of method flow diagram for trained air control model that this specification embodiment provides；

Fig. 2 is a kind of schematic device for trained air control model that this specification embodiment provides；

Fig. 3 is a kind of equipment schematic diagram for trained air control model that this specification embodiment provides.

Specific implementation mode

In the prior art, technical staff generally tends to using traditional supervised learning algorithm training air control model, And supervised learning algorithm usually requires that with the sample of mark of magnanimity be input.But by taking the scene of arbitrage risk as an example, with Paying party and beneficiary for the purpose of arbitrage usually will not all recognize that the true intention of oneself is arbitrage, this just need by manually Lai Being labeled one by one to the transferred account service of magnanimity (for example, the transferred account service for being accused of arbitrage risk is labeled as 1, will not be related to covering 0) transferred account service of existing risk is labeled as.That is, using arbitrage risk as under many scenes of representative, due to needing to sea The sample of amount is labeled, therefore causes to train the cost spent by air control model higher.As it can be seen that such as how lower cost is instructed Practicing air control model becomes this field assistant officer technical problem to be solved.

And in this specification embodiment, with it is existing marked sample be mark collect, with need in the prior art by The sample to be marked of mark is collection to be marked.Include limited is concentrated to mark sample according to mark first, training obtains one Then a air control model starts iteration and updates the air control model.In each iteration, collected according to mark, from concentration to be marked It takes out several samples to be marked to be labeled, with update mark collection, then be collected according to updated mark, re -training air control mould Type completes an iteration.

With increasing for iterations, it is usually more and more to have marked the sample of mark concentrated and include, concentration to be marked Including sample to be marked it is usually fewer and fewer, when meeting specified requirements, stop iteration, marked concentrate include mark Sample just stops increasing, to be marked that the sample to be marked for including is concentrated just to stop reducing.In general, without including to concentration to be marked All samples to be marked all complete to mark, so that it may the air control model that risk identification accuracy meets the requirements is obtained with training.It can See, by this specification embodiment, the cost spent by air control model that training risk identification accuracy is met the requirements is lower.

In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation Attached drawing in book one or more embodiment, is clearly and completely described the technical solution in this specification embodiment, shows So, described embodiment is only this specification a part of the embodiment, instead of all the embodiments.Pass through this specification reality Example is applied, the every other embodiment that those of ordinary skill in the art are obtained without creative efforts is all answered When the range for belonging to this specification protection.

Below in conjunction with attached drawing, the technical solution that each embodiment of this specification provides is described in detail.

Fig. 1 is a kind of method flow diagram for trained air control model that this specification embodiment provides, and is included the following steps：

S100：Collected according to the mark, training air control model.

The executive agent of this method can be there is data-handling capacity so as to train air control model equipment (referred to herein as For processing equipment), such as server, personal computer, mobile phone etc..

In the prior art, the quantity for having marked sample is usually limited, needs to be labeled a large amount of sample to be marked, The air control model that risk identification accuracy is met the requirements is obtained with training.For example, technical staff thinks to need with 100000 It is input to mark sample, could train and obtain the air control model that risk identification accuracy is met the requirements, and existing has marked sample This quantity is 1000, therefore, it is also desirable to be labeled to 99000 samples to be marked.

It is that mark collects with the existing sample that marked, to need to be marked in the prior art in this specification embodiment The sample to be marked of note is collection to be marked.In upper example, it is to have marked sample with 1000 for mark to collect, is waited for 99000 Mark sample is collection to be marked.

It should be noted that since core of the invention thought is, by each repetitive exercise, gradually wait marking by described The sample to be marked taking-up that note is concentrated is labeled, and is collected with updating the mark, therefore, in entire training process, the mark Note concentrate include marked sample and it is described it is to be marked concentrate include sample to be marked all can constantly change.It uses on edge Example, before the method for executing trained air control model shown in FIG. 1, the mark is concentrated has marked sample comprising 1000, institute It includes 99000 samples to be marked to state to be marked concentrate.With the propulsion of each repetitive exercise, the mark concentration includes Marked that sample is usually more and more, it is described it is to be marked concentrate the sample to be marked for including usually fewer and fewer, when stopping iteration Training when, it is described mark concentrate include marked sample and it is described mark concentrate include sample to be marked usually no longer become Change.

In this step S100, it is input that the sample of mark for including can be specifically concentrated with the mark, using there is prison Superintend and direct machine learning algorithm (such as decision Tree algorithms, random forests algorithm), training air control model.

S102：Based on the air control model that training obtains, the sample to be marked that the concentration to be marked includes is carried out not true Qualitative analysis.

S104：Several samples to be marked are taken out from the concentration to be marked according to analysis result, and will be described to be marked Concentrate the sample to be marked not being removed as subset to be marked.

In this specification embodiment, it is based on air control model, uncertainty analysis is carried out to sample to be marked, is actually divided Analysing the air control model can to what extent determine whether sample to be marked has risk.The uncertainty of sample to be marked Stronger, air control model is more difficult to whether sample to be marked has risk.

Specifically, in step s 102, the air control model that can be obtained based on training, calculating the concentration to be marked includes Each of the corresponding uncertainty of sample to be marked；For each sample to be marked, the corresponding uncertainty of the sample to be marked The air control model that characterization training obtains carries out the sample to be marked the complexity of risk identification, and uncertainty is higher, risk Identification is more difficult.

In step S104, uncertainty can be taken out from the concentration to be marked and be more than the to be marked of the first specified threshold Sample.Wherein, first specified threshold can specify as needed.

That is, in step S104, it is real from the sample to be marked to be marked for concentrating taking-up according to analysis result It is the sample that current air control model is difficult on border, is added after these samples are labeled from concentration to be marked taking-up To mark collection for training, contribute to the risk identification accuracy for promoting air control model.

Further, it each of includes the corresponding uncertainty of sample to be marked to calculate the concentration to be marked, specifically may be used To be：The use of the obtained air control model of training each of includes that sample to be marked is identified to the concentration to be marked, obtains The concentration to be marked each of includes the corresponding risk probability of sample to be marked；For each sample to be marked, this is waited marking The corresponding risk probability of sample and the opposite number of 0.5 absolute value of the difference are noted as the corresponding uncertainty of the sample to be marked.

For example, two, the air control model pair sample to be marked (sample A to be marked and sample B to be marked) obtained using training It is identified, it is 0.9 (it is believed that having risk) to obtain the corresponding risk probabilities of sample A to be marked, and sample B to be marked is corresponding Risk probability is 0.4 (it difficult to determine whether have risk), can be calculated the corresponding uncertainties of sample A to be marked be-| 0.9-0.5 |=- 0.4, the corresponding uncertainty of sample B to be marked is-| 0.4-0.5 |=- 0.1.

In addition, in step S104, it can be directed to predetermined each sample type, wait marking according to the sample type The corresponding uncertainty of sample is noted, from the sample to be marked to be marked for concentrating several sample types of taking-up.Wherein, sample Type can specifically be determined according to the source of sample, can also be determined according to other standards.

In addition, before step S104, described to be marked concentrate between each sample to be marked for including can also be calculated Similarity.Wherein, the similarity between each sample to be marked can according to the corresponding feature vector of each sample to be marked it Between distance be calculated.

In this way, in step S104, it can be according to similar between each sample to be marked that the concentration to be marked includes Degree and the corresponding uncertainty of each sample to be marked, calculate the corresponding characterization value of each sample to be marked；For each Sample to be marked, the sample to be marked is more similar to other samples to be marked, and the corresponding characterization value of the sample to be marked is lower；It should The corresponding uncertainty of sample to be marked is lower, and the corresponding characterization value of the sample to be marked is lower；It is taken from the concentration to be marked Go out the sample to be marked that characterization value is more than the second specified threshold.Wherein, second specified threshold can specify as needed.

Obviously, the sample to be marked for taking out multiple sample types from the concentration to be marked is used to update institute after being labeled Mark collection is stated, and/or takes out the less similar sample that do not mark each other from the concentration to be marked and is labeled for more The new mark collects, and can to have marked that sample is more diversified (each have been marked based on when re -training air control model It is also just more representative to note sample), it also can be realized as better training effect.

It should be noted herein, taking-up put back to for nothing from the mode to be marked for concentrating taking-up sample to be marked. That is, once taking out some sample to be marked from concentration to be marked, the sample to be marked is just from the concentration quilt to be marked It removes.

S106：It is supplied to mark side to be labeled in the sample to be marked of taking-up, and receives the mark that mark side returns Sample.

In this specification embodiment, mark side can be supplied to manually to be marked the sample to be marked of taking-up, and/ Or, being supplied to mark side to be labeled in the sample to be marked of taking-up, so that the mark root is advised according to predetermined mark Then the sample to be marked received is labeled.

Wherein it is possible to predefine mark rule in the following manner：Monitoring artificial mark performed when manually marking Operation；According to the artificial labeling operation monitored, mark rule is determined.Specifically, it can be grasped according to the artificial mark monitored Make, training decision tree, the decision tree that training is obtained is as mark rule.

S108：The sample that at least partly marked that the mark side returns is added to the mark concentration.

In this step S108, the sample that at least partly marked that the mark side returns is added to the mark collection In, also it is achieved that the primary update to the mark collection.

In this step S108, the mark for other sample types in addition to specified sample type that the mark side is returned Note sample is added to the mark and concentrates；And each of the specified sample type returned for the mark side has marked Sample judges whether to confirm that this has marked sample according to the mark that sample is carried out has been marked to this every time；If so, according to The mark that sample is carried out has been marked to this every time, this has been marked again and has marked sample, and this has been marked into sample and has been added to institute Mark is stated to concentrate；Otherwise, it this has been marked into sample is re-used as sample to be marked and be added in the subset to be marked.

It should be noted that the sample of the specified sample type, which is usually mark side, is easy the wrong sample of mark.Therefore, The sample of specified sample type usually requires repeatedly to be marked, then according to the multiple marks for the sample for specifying sample type, really Unique mark of the fixed sample.It will not before determining unique mark of the sample for the sample of each specified sample type It is added to the mark collection using the sample as sample has been marked.

Further, sample has been marked for each of the specified sample type that the mark side returns, according to every It is secondary that the mark that sample is carried out has been marked to this, judge whether to confirm that this has marked sample, can be specifically：Judge to have marked this Whether the quantity for the mark that note sample is carried out reaches specified quantity, if so, confirming that this has marked sample, otherwise, refusal is true Recognize this and marks sample.

If confirming, this has marked sample, can will mark most identical of quantity in the mark that sample is carried out to this Mark is re-used as the mark for having marked sample mark.

S110：The mark sample for including is concentrated according to the mark, is taken out from the subset to be marked and several waits marking Note sample is labeled, and the sample of mark that mark obtains is added to the mark and is concentrated.

In this step S110, the thought of semi-supervised learning can be used for reference, the mark for including is concentrated according to the mark Sample takes out several samples to be marked from the subset to be marked and is labeled.

The basic thought of semi-supervised learning is based on data distribution it is assumed that according to sample has been marked, realization does not mark some Note the mark of data.Wherein, the data distribution hypothesis generally comprises smooth hypothesis, cluster is assumed, manifold is assumed etc..

For smoothly assuming.The smooth hypothesis refers to positioned at two of dense data region apart from close sample Label is similar, that is to say, that when two samples are connected by the side in dense data region, they have very maximum probability with identical Label；On the contrary, when two samples are separated by sparse data region, their label tends to be different.Based on described smooth It is assumed that the mark sample for including can be concentrated according to the mark, realize to several to be marked in the subset to be marked The accurate mark of sample.

As it can be seen that in an iteration training, it usually needs updated twice to the data set.It is trained in an iteration In, by step S102~S108, realize that the first time to the mark collection updates, this update is actually to have used for reference actively The thought of learning algorithm；By step S110, realizes to second of update of the mark collection, actually used for reference semi-supervised The thought of learning algorithm.

S112：Collected according to the mark, re -training air control model, until meeting specified requirements.

Can be that the mark concentrates include manually to mark before step S112 in this specification embodiment Marked sample, according to the mark rule mark marked sample and other have marked sample and have distributed different first training Weighted value.In this step S112, the sample of mark manually marked for including can be concentrated, according to according to the mark The sample of mark of mark rule mark has marked sample corresponding first with other and has trained weighted value, re -training air control Model.For example, it can be 8 that the mark, which concentrates the corresponding first training weighted value of the sample of mark manually marked for including, Can be 4 according to the corresponding first training weighted value of the sample of mark of the mark rule mark, other have marked sample point Not it is corresponding first training weighted value can be 1.Sample is marked for each, this has marked corresponding first training of sample Weighted value is bigger, and in training air control model, this has marked sample, and bigger to the contribution of training result (i.e. this has marked sample By the sample of selective learning).

It is possible to further each of include to have marked sample for mark concentration, sample pair has been marked according to this The time interval of the first training weighted value answered and current time between marking at the time of this has marked sample, determines that this has been marked Note the corresponding second training weighted value of sample；This has marked sample corresponding first and has trained weighted value bigger, this has marked sample Corresponding second training weighted value is bigger；This has marked that the corresponding time interval of sample is smaller, this has marked sample corresponding Two training weighted values are bigger；The sample of mark manually marked for including is concentrated according to the mark, according to mark rule The sample of mark of mark has marked sample corresponding second with other and has trained weighted value, re -training air control model.Needle Sample is marked to each, this has been marked, and the corresponding second training weighted value of sample is bigger, and in training air control model, this has been marked It is bigger to the contribution of training result (i.e. this has marked sample by the sample of selective learning) to note sample.

In this specification embodiment, collected according to the mark updated twice, after re -training air control model, i.e., Complete an iteration training.When meeting specified requirements, just no longer start next iteration training, training terminates, will be current Air control model exports, as training result.

Wherein, the specified requirements can specifically specify as needed, for example, the specified requirements can be trained air control The number of model reaches predetermined number of times.For another example, it each of includes to be marked that the specified requirements, which can be the concentration to be marked, The corresponding uncertainty of sample is all not more than above-mentioned first specified threshold.

By the method for trained air control model shown in FIG. 1, with increasing for iterations, it includes to have marked concentration Mark sample is more and more, to be marked to concentrate the sample to be marked for including fewer and fewer, when meeting specified requirements, stops changing In generation, has marked and the sample of mark for including is concentrated just to stop increasing, to be marked that the sample to be marked for including is concentrated just to stop reducing. In general, without concentrating all samples to be marked for including all to complete to mark to be marked, so that it may obtain risk identification standard with training The air control model that true property is met the requirements.As it can be seen that by this specification embodiment, wind that training risk identification accuracy is met the requirements The cost controlled spent by model is lower.

In addition to this, the present invention can also realize following technique effect：

1, in step S102~S108, the thought of Active Learning can be used for reference, based on current air control model from described Several current air control models are taken out in unlabeled set to be difficult to not mark sample, are supplied to mark side to be labeled it, The sample of mark that mark side is returned is used for re -training air control model, and " the weak item " that can be directed to air control model is mended By force, it is obviously improved the risk identification accuracy of air control model.

2, in step s 110, the thought that semi-supervised learning can be used for reference, based on common data analysis it is assumed that according to institute It states mark and concentrates the mark sample for including, several samples to be marked of concentration to be marked are accurately marked, from And it can further be concentrated to the mark and add several marked samples.Which achieves filled to the existing sample that marked Divide and utilizes.

In addition, in this specification embodiment, marking types can there are two types of, the first mark and the second mark, Ye Jizhen Sample is marked to each, which is not the first mark, is exactly the second mark.For example, described One mark can be " 1 ", and the sample of mark for being labeled as the first mark is to be identified the risky sample of tool, second mark Can be " 0 ", the sample of mark for being labeled as the second mark is that not confirmed has risky sample.

In step S108, the sample of mark for being labeled as the first mark that can return to the mark side is added to institute It states mark to concentrate, and the sample of mark for being labeled as the second mark that the mark side returns is re-used as sample to be marked It is added in the subset to be marked.

In step s 110, may be used positive sample and sample learning to be marked (Positive and Unlabeled, PU) Learning algorithms concentrate the mark sample for being labeled as the first mark for including, from described to be marked according to the mark Several samples to be marked for being labeled as the second mark are determined in subset, and the sample to be marked determined is labeled as the second mark Note；The obtained sample of mark for being labeled as the second mark is added to the mark to concentrate.Wherein, the PU learning Algorithm is actually a kind of special semi-supervised learning algorithm.

It is well known to those skilled in the art, based on the thought of PU Learning algorithms, a variety of sides may be used Formula determines several samples to be marked for being labeled as the second mark from the subset to be marked.For example, can be by the mark Note concentrates the sample of mark for being labeled as the first mark for including that set P is added, and waits marking by include in the subset to be marked It notes sample and set U is added.Each of U samples to be marked are labeled as the second mark, separately training is classified using P and U Model, it is, for example, possible to use bayesian algorithm, using P and U, separately training obtains Bayes classifier, as the classification mould Type.Then, classified to each of U samples to be marked using the disaggregated model, be the second mark by classification results Sample to be marked is determined as that the sample to be marked of the second mark can be labeled as.

In addition, after the flow for terminating trained air control model shown in FIG. 1, obtained air control model can be supplied to Model acceptance side is tested, and the sample of mark generated in test process can be returned to processing equipment by model acceptance side, The sample of mark that model acceptance side returns is added to the mark by processing equipment to collect.

Based on the method for trained air control model shown in FIG. 1, this specification embodiment also correspondence provides a kind of trained wind The device of model is controlled, as shown in Fig. 2, mark is concentrated comprising sample has been marked, to be marked concentrate includes sample to be marked, the dress Set including：

Training module 201 collects according to the mark, training air control model, and after being updated according to fourth processing module 206 Mark collection, re -training air control model, until meet specified requirements；

Analysis module 202, based on the obtained air control model of training, to it is described it is to be marked concentrate the sample to be marked for including into Row analysis of uncertainty；

First processing module 203 takes out several samples to be marked according to analysis result from the concentration to be marked, and will The sample to be marked that the concentration to be marked is not removed is as subset to be marked；

The sample to be marked of taking-up is supplied to mark side to be labeled, and receives mark side and return by Second processing module 204 The mark sample returned；

The sample that at least partly marked that the mark side returns is added to the mark collection by third processing module 205 In；

The fourth processing module 206 concentrates the mark sample for including, from the subset to be marked according to the mark Middle several samples to be marked of taking-up are labeled, and will be marked the obtained sample of mark and be added to the mark concentration.

The analysis module 202, based on the obtained air control model of training, it each of includes to wait for calculate the concentration to be marked Mark the corresponding uncertainty of sample；For each sample to be marked, the corresponding uncertainty characterization training of the sample to be marked Obtained air control model carries out the sample to be marked the complexity of risk identification, and uncertainty is higher, and risk identification is more tired It is difficult.

The first processing module 203 takes out uncertainty from the concentration to be marked and is more than waiting for for the first specified threshold Mark sample.

The first processing module 203 is taking out several samples to be marked according to analysis result from the concentration to be marked Before, the similarity to be marked concentrated between each sample to be marked for including is calculated；Include according to the concentration to be marked Each sample to be marked between similarity and the corresponding uncertainty of each sample to be marked, calculate each sample to be marked point Not corresponding characterization value；For each sample to be marked, the sample to be marked is more similar to other samples to be marked, this is to be marked The corresponding characterization value of sample is lower；The corresponding uncertainty of the sample to be marked is lower, the corresponding characterization value of the sample to be marked It is lower；To be marked sample of the characterization value more than the second specified threshold is taken out to be marked concentrate.

The first processing module 203, for predetermined each sample type, according to the to be marked of the sample type The corresponding uncertainty of sample, from the sample to be marked to be marked for concentrating several sample types of taking-up.

The sample to be marked of taking-up is supplied to mark side manually to be marked by the Second processing module 204；And/or The sample to be marked of taking-up is supplied to mark side so that the mark root according to predetermined mark rule to receiving Sample to be marked is labeled.

Mark rule is predefined, is specifically included：

Monitoring artificial labeling operation performed when manually marking；

According to the artificial labeling operation monitored, mark rule is determined.

The third processing module 205, other sample types in addition to specified sample type that the mark side is returned The sample of mark be added to the mark and concentrate；And for each of the specified sample type that the mark side returns Sample has been marked, according to the mark that sample is carried out has been marked to this every time, has judged whether to confirm that this has marked sample；If so, Then according to the mark that sample is carried out has been marked to this every time, this is marked again and has marked sample, and this has been marked into sample and has been added The mark is added to concentrate；Otherwise, it this has been marked into sample is re-used as sample to be marked and be added in the subset to be marked.

Marked sample for each, this marked sample be labeled as the first mark or second mark；

The third processing module 205 adds the sample of mark for being labeled as the first mark that the mark side returns It is concentrated to the mark；The mark sample for including is concentrated according to the mark in the fourth processing module 206, is waited for from described It marks before concentrating several samples to be marked of taking-up to be labeled, the mark for being labeled as the second mark that the mark side is returned Note sample is re-used as sample to be marked and is added in the subset to be marked.

The fourth processing module 206, using positive sample and sample learning PU Learning algorithms to be marked, according to institute It states mark and concentrates the mark sample for being labeled as the first mark for including, several mark is determined from the subset to be marked For the sample to be marked of the second mark, the sample to be marked determined is labeled as the second mark；It is labeled as second by what is obtained The sample of mark of mark is added to the mark and concentrates.

The fourth processing module 206 collects according to the mark, is the mark before re -training air control model What what concentration included manually marked marked sample, according to the sample of mark of the mark rule mark and other marked sample This distributes the first different training weighted values；The sample of mark manually marked for including is concentrated according to the mark, according to institute State mark rule mark the sample of mark and other marked sample it is corresponding first train weighted value, re -training wind Control model.

The fourth processing module 206 each of includes to have marked sample for mark concentration, has been marked according to this The time interval of the corresponding first training weighted value of sample and current time between marking at the time of this has marked sample, determines This has marked the corresponding second training weighted value of sample；This has marked sample corresponding first and has trained weighted value bigger, this has been marked It is bigger to note the corresponding second training weighted value of sample；This has marked that the corresponding time interval of sample is smaller, this has marked sample pair The the second training weighted value answered is bigger；The sample of mark manually marked for including is concentrated according to the mark, according to the mark The sample of mark of note rule mark has marked sample corresponding second with other and has trained weighted value, re -training air control mould Type.

The specified requirements, specifically includes：The number of training air control model reaches predetermined number of times.

Based on the method for Fig. 1 training air control models shown, this specification embodiment also correspondence provides a kind of trained air control The equipment of model, as shown in figure 3, mark is concentrated comprising sample has been marked, to be marked concentrate includes sample to be marked, the equipment packet One or more processors and memory are included, the memory has program stored therein, and is configured to by one or more of Processor executes following steps：

Collected according to the mark, training air control model；

Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for Fig. 3 institutes For the equipment shown, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to method reality Apply the part explanation of example.

In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method flow can be readily available.

Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller includes but not limited to following microcontroller Device：ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained in the form of logic gate, switch, application-specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.

System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.

For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit is realized can in the same or multiple software and or hardware when specification.

It should be understood by those skilled in the art that, the embodiment of this specification can be provided as method, system or computer journey Sequence product.Therefore, in terms of this specification can be used complete hardware embodiment, complete software embodiment or combine software and hardware Embodiment form.Moreover, it wherein includes computer usable program code that this specification, which can be used in one or more, The computer implemented in computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of program product.

This specification is with reference to the method, equipment (system) and computer according to this specification one or more embodiment The flowchart and/or the block diagram of program product describes.It should be understood that flow chart and/or side can be realized by computer program instructions The combination of the flow and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram.It can provide These computer program instructions are set to the processing of all-purpose computer, special purpose computer, Embedded Processor or other programmable datas Standby processor is to generate a machine so that is executed by computer or the processor of other programmable data processing devices Instruction generates specifies for realizing in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes Function device.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Including so that process, method, commodity or equipment including a series of elements include not only those elements, but also wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that wanted including described There is also other identical elements in the process of element, method, commodity or equipment.

This specification can describe in the general context of computer-executable instructions executed by a computer, such as journey Sequence module.Usually, program module include routines performing specific tasks or implementing specific abstract data types, program, object, Component, data structure etc..One or more embodiments that this specification can also be put into practice in a distributed computing environment, at this In a little distributed computing environment, by executing task by the connected remote processing devices of communication network.It is counted in distribution It calculates in environment, program module can be located in the local and remote computer storage media including storage device.

Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.

It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the action recorded in detail in the claims or step can be come according to different from the sequence in embodiment It executes and desired result still may be implemented.In addition, the process described in the accompanying drawings not necessarily require show it is specific suitable Sequence or consecutive order could realize desired result.In some embodiments, multitasking and parallel processing be also can With or it may be advantageous.

The foregoing is merely one or more embodiments of this specification, are not limited to this specification.For For those skilled in the art, one or more embodiments of this specification can have various modifications and variations.It is all in this explanation Any modification, equivalent replacement, improvement and so within the spirit and principle of one or more embodiments of book, should be included in Within the right of this specification.

Claims

1. a kind of method of trained air control model, mark is concentrated comprising sample has been marked, and to be marked concentrate includes sample to be marked, The method includes：

Collected according to the mark, training air control model；

Based on the air control model that training obtains, the sample to be marked for including to the concentration to be marked carries out analysis of uncertainty；

Several samples to be marked are taken out from the concentration to be marked according to analysis result, and the concentration to be marked is not taken The sample to be marked gone out is as subset to be marked；

The mark sample for including is concentrated according to the mark, several samples to be marked are taken out from the subset to be marked and are carried out Mark, and the sample of mark that mark obtains is added to the mark and is concentrated；

2. the method as described in claim 1 waits marking based on the air control model that training obtains to what the concentration to be marked included It notes sample and carries out analysis of uncertainty, specifically include：

Based on the air control model that training obtains, each of the calculating concentration to be marked includes sample to be marked is corresponding uncertain Degree；For each sample to be marked, the air control model that the corresponding uncertainty characterization training of the sample to be marked obtains waits for this The complexity that sample carries out risk identification is marked, uncertainty is higher, and risk identification is more difficult.

3. method as claimed in claim 2 takes out several samples to be marked, tool according to analysis result from the concentration to be marked Body includes：

To be marked sample of the uncertainty more than the first specified threshold is taken out to be marked concentrate.

4. method as claimed in claim 2 is taking out several samples to be marked according to analysis result from the concentration to be marked Before, the method further includes：

Calculate the similarity to be marked concentrated between each sample to be marked for including；

Several samples to be marked are taken out from the concentration to be marked according to analysis result, are specifically included：

It is corresponding according to the similarity to be marked concentrated between each sample to be marked for including and each sample to be marked Uncertainty calculates the corresponding characterization value of each sample to be marked；For each sample to be marked, the sample to be marked and its His sample to be marked is more similar, and the corresponding characterization value of the sample to be marked is lower；The corresponding uncertainty of the sample to be marked is got over Low, the corresponding characterization value of the sample to be marked is lower；

To be marked sample of the characterization value more than the second specified threshold is taken out to be marked concentrate.

5. method as claimed in claim 2 takes out several samples to be marked, tool according to analysis result from the concentration to be marked Body includes：

For predetermined each sample type, according to the corresponding uncertainty of sample to be marked of the sample type, from institute State the sample to be marked to be marked concentrated and take out several sample types.

6. the sample to be marked of taking-up is supplied to mark side to be labeled, specifically included by the method as described in claim 1：

It is supplied to mark side manually to be marked in the sample to be marked of taking-up；And/or

The sample to be marked of taking-up is supplied to mark side, so that the mark root is regular to receiving according to predetermined mark To sample to be marked be labeled.

7. method as claimed in claim 6, predefining mark rule, specifically include：

Monitoring artificial labeling operation performed when manually marking；

8. the sample that at least partly marked that the mark side returns is added to the mark by method as claimed in claim 5 Note is concentrated, and is specifically included：

The sample of mark for other sample types in addition to specified sample type that the mark side returns is added to the mark Note is concentrated；And

Sample is marked for each of the specified sample type that the mark side returns, according to having marked sample to this every time This mark carried out judges whether to confirm that this has marked sample；

If so, according to the mark that sample is carried out has been marked to this every time, this is marked again and has marked sample, and this has been marked Note sample is added to the mark and concentrates；

Otherwise, it this has been marked into sample is re-used as sample to be marked and be added in the subset to be marked.

9. the method as described in claim 1 has marked sample for each, this marked sample be labeled as the first mark or Second mark；

The sample that at least partly marked that the mark side returns is added to the mark concentration, is specifically included：

The sample of mark for being labeled as the first mark that the mark side returns is added to the mark to concentrate；

The mark sample for including is being concentrated according to the mark, taking out several samples to be marked from the concentration to be marked carries out Before mark, the method further includes：

By the sample of mark for being labeled as the second mark that the mark side returns be re-used as sample to be marked be added to it is described In subset to be marked.

10. method as claimed in claim 9 concentrates the mark sample for including, from the son to be marked according to the mark It concentrates several samples to be marked of taking-up to be labeled, and the obtained sample of mark will be marked and be added to the mark concentration, tool Body includes：

Using positive sample and sample learning PU Learning algorithms to be marked, concentrated include to be labeled as the according to the mark The mark sample of one mark, determines several samples to be marked for being labeled as the second mark from the subset to be marked, The sample to be marked determined is labeled as the second mark；

The obtained sample of mark for being labeled as the second mark is added to the mark to concentrate.

11. method as claimed in claim 6, collecting according to the mark, before re -training air control model, the method is also Including：

The mark sample for having marked sample, having been marked according to the mark rule manually marked for including is concentrated for the mark Sample, which has been marked, with other distributes the first different training weighted values；

Collected according to the mark, re -training air control model specifically includes：

The mark sample for having marked sample, having been marked according to the mark rule manually marked for including is concentrated according to the mark This has marked sample corresponding first with other and has trained weighted value, re -training air control model.

12. method as claimed in claim 11 concentrates the sample of mark manually marked for including, basis according to the mark The sample of mark of the mark rule mark has marked sample corresponding first with other and has trained weighted value, re -training Air control model, specifically includes：

For mark concentration include it is each marked sample, having marked sample corresponding first according to this trains weighted value And time interval of the current time between marking at the time of this has marked sample, determine that this has marked corresponding second instruction of sample Practice weighted value；This has marked sample corresponding first and has trained weighted value bigger, this has marked the corresponding second training weight of sample Value is bigger；It is smaller that this has marked the corresponding time interval of sample, and it is bigger that this has marked the corresponding second training weighted value of sample；

The mark sample for having marked sample, having been marked according to the mark rule manually marked for including is concentrated according to the mark This has marked sample corresponding second with other and has trained weighted value, re -training air control model.

13. the method as described in claim 1, the specified requirements, specifically include：The number of training air control model reaches specified Number.

14. a kind of device of trained air control model, mark is concentrated comprising sample has been marked, and to be marked concentrate includes sample to be marked This, described device includes：

Training module collects according to the mark, training air control model, and is collected according to the updated mark of fourth processing module, Re -training air control model, until meeting specified requirements；

Analysis module carries out the sample to be marked that the concentration to be marked includes not true based on the air control model that training obtains Qualitative analysis；

First processing module is taken out several samples to be marked from the concentration to be marked according to analysis result, and is waited for described Mark concentrates the sample to be marked not being removed as subset to be marked；

The sample to be marked of taking-up is supplied to mark side to be labeled, and receives mark side and return by Second processing module Mark sample；

The fourth processing module is concentrated the mark sample for including according to the mark, is taken out from the subset to be marked Several samples to be marked are labeled, and the sample of mark that mark obtains is added to the mark and is concentrated.

15. device as claimed in claim 14, the analysis module is waited for based on the air control model that training obtains described in calculating It each of includes the corresponding uncertainty of sample to be marked that mark, which is concentrated,；For each sample to be marked, the sample pair to be marked The air control model that the uncertainty characterization training answered obtains carries out the sample to be marked the complexity of risk identification, does not know Degree is higher, and risk identification is more difficult.

16. it is big to take out uncertainty from the concentration to be marked for device as claimed in claim 15, the first processing module In the sample to be marked of the first specified threshold.

17. device as claimed in claim 15, the first processing module, according to analysis result from the concentration to be marked Before taking out several samples to be marked, the similarity to be marked concentrated between each sample to be marked for including is calculated；According to The similarity to be marked concentrated between each sample to be marked for including and the corresponding uncertainty of each sample to be marked, Calculate the corresponding characterization value of each sample to be marked；For each sample to be marked, the sample to be marked is to be marked with other Sample is more similar, and the corresponding characterization value of the sample to be marked is lower；The corresponding uncertainty of the sample to be marked is lower, this waits marking It is lower to note the corresponding characterization value of sample；To be marked sample of the characterization value more than the second specified threshold is taken out to be marked concentrate This.

18. device as claimed in claim 15, the first processing module, for predetermined each sample type, root According to the corresponding uncertainty of sample to be marked of the sample type, waiting for for several sample types is taken out to be marked concentrate Mark sample.

19. the sample to be marked of taking-up is supplied to mark side by device as claimed in claim 14, the Second processing module Manually marked；And/or the sample to be marked of taking-up is supplied to mark side, so that the mark root is according to predetermined Mark rule is labeled the sample to be marked received.

20. device as claimed in claim 19 predefines mark rule, specifically includes：

Monitoring artificial labeling operation performed when manually marking；

21. device as claimed in claim 18, the third processing module remove specified sample class by what the mark side returned The sample of mark of other sample types outside type is added to the mark and concentrates；And described in returning for the mark side Each of specified sample type has marked sample, according to the mark that sample is carried out has been marked to this every time, judges whether to confirm This has marked sample；If so, according to the mark that sample is carried out has been marked to this every time, this is marked again and has marked sample, And this sample is marked into and has been added to the mark concentration；Otherwise, this is marked into sample and is re-used as sample addition to be marked Into the subset to be marked.

22. device as claimed in claim 14 has marked sample for each, what this had marked sample is labeled as the first mark Or second mark；

The sample of mark for being labeled as the first mark that the mark side returns is added to the mark by the third processing module Note is concentrated；The mark sample for including is concentrated according to the mark in the fourth processing module, is taken from the concentration to be marked Go out before several samples to be marked are labeled, the sample of mark for being labeled as the second mark that the mark side is returned is again It is added in the subset to be marked as sample to be marked.

23. device as claimed in claim 22, the fourth processing module, using positive sample and sample learning PU to be marked Learning algorithms concentrate the mark sample for being labeled as the first mark for including, from the son to be marked according to the mark Concentration determines several samples to be marked for being labeled as the second mark, and the sample to be marked determined is labeled as the second mark Note；The obtained sample of mark for being labeled as the second mark is added to the mark to concentrate.

24. device as claimed in claim 19, the fourth processing module collect, re -training air control according to the mark It is that the mark concentrates the mark for having marked sample, having been marked according to the mark rule manually marked for including before model Note sample has marked sample with other and has distributed the first different training weighted values；The artificial mark for including is concentrated according to the mark Marked sample, according to it is described mark rule mark the sample of mark and other marked sample it is corresponding first instruction Practice weighted value, re -training air control model.

25. device as claimed in claim 14, the fourth processing module each of includes to have marked for mark concentration Note sample, according to this marked sample it is corresponding first training weighted value and current time to mark this marked sample at the time of Between time interval, determine this marked sample it is corresponding second training weighted value；This has marked corresponding first instruction of sample It is bigger to practice weighted value, it is bigger that this has marked the corresponding second training weighted value of sample；This has marked the corresponding time interval of sample Smaller, it is bigger that this has marked the corresponding second training weighted value of sample；Concentrate include manually to mark according to the mark Mark sample, according to it is described mark rule mark the sample of mark and other marked sample it is corresponding second training weigh Weight values, re -training air control model.

26. device as claimed in claim 14, the specified requirements, specifically include：The number of training air control model reaches finger Determine number.

27. a kind of equipment of trained air control model, mark is concentrated comprising sample has been marked, and to be marked concentrate includes sample to be marked This, which includes one or more processors and memory, and the memory has program stored therein, and is configured to by described One or more processors execute following steps：

Collected according to the mark, training air control model；