CN110334814A - For constructing the method and system of risk control model - Google Patents

For constructing the method and system of risk control model Download PDF

Info

Publication number
CN110334814A
CN110334814A CN201910587071.9A CN201910587071A CN110334814A CN 110334814 A CN110334814 A CN 110334814A CN 201910587071 A CN201910587071 A CN 201910587071A CN 110334814 A CN110334814 A CN 110334814A
Authority
CN
China
Prior art keywords
model
new
feature
basic
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910587071.9A
Other languages
Chinese (zh)
Other versions
CN110334814B (en
Inventor
金宏
王维强
赵闻飙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910587071.9A priority Critical patent/CN110334814B/en
Publication of CN110334814A publication Critical patent/CN110334814A/en
Application granted granted Critical
Publication of CN110334814B publication Critical patent/CN110334814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Abstract

Present disclose provides a kind of methods for efficiently constructing risk control model, comprising: building basic model library is to select the model in basic model library when triggering new business, to build default model;It is generated by automated characterization, automated characterization selection and automatic adjust are joined to construct the new model of suitable new business;Via transfer learning training default model and new model;Automatically housebroken default model and housebroken new model are merged, to generate Fusion Model;Housebroken default model is used as model on line, and housebroken new model and Fusion Model are used as backup model;And when one of backup model is better than model on line, with model on backup model substitution line.

Description

For constructing the method and system of risk control model
Technical field
The disclosure relates generally to risk control more particularly to risk control model.
Background technique
The risk control of internet finance is related to transaction and financial risks prevention and control, including usurps, cheats, market cheating, rubbish Rubbish registration identification and decision etc..
By in supermarket check out counters, by taking the scene that cell phone application is paid as an example, risk control system needs to check mobile phone account Whether family is stolen, whether cheated, whether have illegal arbitrage if cheating etc..In practice, different risk classifications can give model Building and update bring different challenges.
Currently, risk control model mainly encounters two bigger problems during exploitation and deployment.
One problem is that newly-built model flow is complicated, and data cleansing therein, model training, model deployment require to expend A large amount of manpower, an average model development and deployment time-consuming were more than 1 month.This causes for new business, model response speed It is slow.
Another problem is that model iteration cycle is long, and the update of entire model needs to expend a large amount of manpowers and time to carry out It instructs and disposes again.This causes risk resisting ability poor, because risk variation at all times, there is very strong antagonism.
This field need it is a kind of efficiently for constructing the method and system of risk control model, can be for new site, new Scene Quick thread model, to lay the foundation quickly to be updated for the risk that the moment changes with iterative model.
Summary of the invention
In order to solve the above technical problems, present disclose provides a kind of efficiently for constructing the scheme of risk control model.
In one embodiment of the disclosure, a kind of method for efficiently constructing risk control model is provided, comprising: structure Basic model library is built to select the model in basic model library when triggering new business, to build default model;By automatic special Sign generates, automated characterization selects and adjusts ginseng automatically to construct the new model of suitable new business;Via the default mould of transfer learning training Type and new model;Automatically housebroken default model and housebroken new model are merged, to generate Fusion Model;It will be housebroken Default model is used as model on line, and housebroken new model and Fusion Model are used as backup model;And when backup model One of better than on line when model, with model on backup model substitution line.
In another embodiment of the present disclosure, building default model further comprises: refining risk mould for each scene Block;Basic model is constructed for each risk module, and basic model library is constructed based on basic model;When triggering new business, Choose the correspondence basic model in basic model library;And the default mould of suitable new business is built using corresponding basic model Type.
In the another embodiment of the disclosure, the risk module refined includes masters, passive side, equipment, environment, row For, relationship, conflict, mutation and FTG (Fraud to Gross).
In another embodiment of the present disclosure, building new model further comprises: obtaining original variable pond;Based on original change Original variable in amount pond automatically generates different types of feature;Selection is suitble to from original variable pond and the feature automatically generated The variable of scene, to generate variable list;Automatic adjust is carried out for variable list to join;And obtain the new model for being suitble to scene.
In the another embodiment of the disclosure, automated characterization generation include original feature is converted, calculate and It polymerize and generates new candidate feature.
In another embodiment of the present disclosure, automated characterization selection is related to character subset search and character subset evaluation.
It is automatic that ginseng is adjusted to optimize it using grid search, random search and Bayes in the another embodiment of the disclosure One.
In one embodiment of the disclosure, a kind of system for efficiently constructing risk control model is provided, comprising: lack Model buildings module is saved, constructs basic model library to select the model in basic model library when triggering new business, to build Default model;New model constructs module, is generated by automated characterization, automated characterization selects and automatic tune ginseng is suitble to newly to construct The new model of business;Model training module, via transfer learning training default model and new model;Fusion Model generates mould Block merges housebroken default model and housebroken new model automatically, to generate Fusion Model;Optimal models select mould Housebroken default model is used as model on line, and housebroken new model and Fusion Model is used as backup model by block, And when one of backup model is better than model on line with model on backup model substitution line.
In another embodiment of the present disclosure, default model builds module further, refining risk mould for each scene Block;Basic model is constructed for each risk module, and basic model library is constructed based on basic model;When triggering new business, Choose the correspondence basic model in basic model library;And the default mould of suitable new business is built using corresponding basic model Type.
In the another embodiment of the disclosure, the risk module refined may include masters, passive side, equipment, environment, Behavior, relationship, conflict, mutation and FTG (Fraud to Gross).
In another embodiment of the present disclosure, new model constructs module further, obtaining original variable pond;Based on original change Original variable in amount pond automatically generates different types of feature;Selection is suitble to from original variable pond and the feature automatically generated The variable of scene, to generate variable list;Automatic adjust is carried out for variable list to join;And obtain the new model for being suitble to scene.
In the another embodiment of the disclosure, it includes to original feature that new model, which constructs module and carries out automated characterization generation, It converted, calculated and is polymerize and generate new candidate feature.
In another embodiment of the present disclosure, new model building module carries out automated characterization selection and is related to character subset search It is evaluated with character subset.
In the another embodiment of the disclosure, new model building module carries out automatic adjust and joins using grid search, searches at random One of rope and Bayes's optimization.
In one embodiment of the disclosure, a kind of computer readable storage medium for being stored with instruction is provided, when these refer to Order is performed so that machine executes foregoing method.
This general introduction is provided to introduce some concepts further described in detailed description below in simplified form.This The key features or essential features for being not intended to mark claimed subject are summarized, are intended to be used to limit claimed The range of theme.
Detailed description of the invention
The above summary of the invention and following specific embodiment of the disclosure can obtain more preferably when reading in conjunction with the drawings Understanding.It should be noted that attached drawing is only used as the example of claimed invention.In the accompanying drawings, identical appended drawing reference Represent same or similar element.
Fig. 1 shows the flow chart of the method for efficiently constructing risk control model according to one embodiment of the disclosure;
Fig. 2 shows the schematic diagrames for the efficiently method of building risk control model according to one embodiment of the disclosure;
Fig. 3 shows the flow chart of the process for building default model according to one embodiment of the disclosure;
Fig. 4 shows the schematic diagram of the process for building default model according to one embodiment of the disclosure;
Fig. 5 shows the flow chart of the process for constructing new model automatically according to one embodiment of the disclosure;
Fig. 6 shows the schematic diagram of the process for constructing new model automatically according to another embodiment of the disclosure;
Fig. 7 shows the block diagram of the system for efficiently constructing risk control model according to one embodiment of the disclosure;
Fig. 8 shows the flow chart of the method for efficiently updating risk control model according to one embodiment of the disclosure;
Fig. 9 shows the schematic diagram of the method for efficiently updating risk control model according to one embodiment of the disclosure;
Figure 10 shows the block diagram of the system for efficiently updating risk control model according to one embodiment of the disclosure.
Specific embodiment
To enable the above objects, features, and advantages of the disclosure more obvious and easy to understand, below in conjunction with attached drawing to the disclosure Specific embodiment elaborate.
Many details are explained in the following description in order to fully understand the disclosure, but the disclosure can be with It is different from other way described herein using other and implements, therefore the disclosure is by the limit of specific embodiment disclosed below System.
The risk control of internet finance is related to transaction and financial risks prevention and control.Mobile payment is bringing the convenient life of people While, also it is faced with the unique challenges of network fraud.Hereinafter, how high the disclosure will describe by taking network fraud as an example Building risk control model in effect ground is with prevention and control network fraud.It will be appreciated by those skilled in the art that the technical solution of the disclosure In constructed risk control model be not limited in that various types of transaction can be widely used in for prevention and control network fraud With the risk prevention system of financial risks.
The present disclosure proposes a kind of for efficiently constructing the scheme of risk control model.For current in the art newly-built Model flow is complicated, and data cleansing therein, model training, model deployment require the problem of expending a large amount of manpower, this public affairs The technical solution opened is built based on basic model, automodel building and Fusion Model generate, and passes through model on line and backup mould The competition in real-time of type efficiently constructs the model excavated with constantly newly-increased feature of risk with optimization algorithm iteration.
The disclosure also proposed a kind of for efficiently updating the scheme of risk control model.The program can be directed to the moment The quick update of the risk implementation model of variation and iteration, greatly increase the adaptive ability of model, prevent to be promoted risk Control ability.Simultaneously, (refit), automatic retraining (retrain), on-line study (online are fitted again automatically ) etc. learning the period of model training and deployment, the efficiency of lift scheme exploitation will be greatly shortened.
Therefore, the technical solution of the disclosure provides not only general technological frame and solution, additionally provides adaptation The model capability of different business developing stage.
It will hereafter be specifically described based on attached drawing and be used to efficiently construct risk control mould according to each embodiment of the disclosure The method and system of type.
Method for efficiently constructing risk control model
Fig. 1 shows the process of the method 100 for efficiently constructing risk control model according to one embodiment of the disclosure Figure.
102, building basic model library is scarce to build to select the model in the basic model library when triggering new business Save model.
During risk control, numerous risk modules can be extracted, including masters, passive side, equipment, environment, Behavior, relationship, conflict, mutation and FTG (Fraud to Gross) etc..These risk modules are carried out effectively as variable Portray, can be divided into: historical information summarizes class variable (velocity class);Derivative class variable, including idiovariation and group's probability; And relationship class variable etc..
To transfer accounts to for account trading, two main bodys being related to are account of payment and collecting account.In air control event, In addition to account trading behavior further includes the information such as operation behavior and the log of account, wherein behavior of the account of payment as masters Including paying, changing close, plusing good friend, change head portrait etc., collecting account includes collecting money, reported, being added as the behavior of passive side Good friend etc..For a money transfer transactions, can the behavior of behavior and collecting account based on account of payment carry out sequential mining point Analysis, and acts and efforts for expediency for account and history long-term action carry out the excavations of different length time windows, and identification account is abnormal Behavior sequence, to promote fraud prevention and control.
For each risk module or variable, different basic models can be built, to construct basic model library.Citing and Speech can construct account maturity, information based on user's gray list, transaction history information etc. for the masters in identity variable The basic models such as leakage crowd, easily stolen crowd, sense of security crowd.It, can acts and efforts for expediency and history based on account for behavior Long-term action constructs the basic models such as account operation behavior, verification interbehavior, scene displacement behavior, fund flow behavior.Needle To equipment, abnormal beaching accommodation, abnormal operation device, exception can be constructed and distort the basic models such as equipment, operation wooden horse equipment.Phase As, for address, it can construct and log in address, abnormal operation address extremely, distort the basic mould such as address, address dummy extremely Type.For relationship, can construct and other side's relationship and scene relationship and the basic models such as content relation and positional relationship.
It will be understood by those skilled in the art that being directed to different risk module or variable, can come according to its different type Different basic models is constructed, will not be described in great detail herein.
When triggering new business or new site, can be come based on the model in the unrestricted choice basic model library of basic model library Automatically it builds and is suitble to business/website default model.It is related actually multiple during building default model Variable merges modeling.It will be understood by those skilled in the art that can be chosen in basic model library for different business or website Different models, to carry out the merging modeling of different variables.
Hereinafter with reference to Fig. 3 and Fig. 4 detailed description according to the process for building default model of one embodiment of the disclosure.
104, generated by automated characterization, automated characterization selection and automatic adjust are joined to construct the new mould of suitable new business Type.
In building new model automatically, it can learn or portray automatically different variables by Feature Engineering.
Automated characterization generation is automatically to construct candidate feature relevant to goal task based on data set, usually by when Between and relational dataset be converted to the eigenmatrix that can be used for machine learning.
Since the characteristic dimension of usually collected data will not be very big, and directly collected feature can not be complete The all information for embodying data needs to find new meaning by the combination of data with existing, it is therefore desirable in conjunction with business demand It is derivative to carry out feature, i.e., certain combination is carried out to existing feature, the significant feature of new tool is generated, to increase individual features Amount, to excavate more valuable feature, obtain optimal models.
Certainly, it may be because that feature is excessive and needs dimension-reduction treatment again sometimes, extracted in multiple features of generally comforming at this time The general character of feature, is modeled to facilitate.
Operation derived from feature is divided into conversion, calculating and polymerization in automated characterization generation, i.e., turns to original feature It changes, calculate and polymerize and generate new candidate feature.For example, basic conversion carried out to unitary variant, for example, by pair Unitary variant carries out log conversion etc..For another example, variable is derived by addition time dimension, such as 6 months transaction data Deng.For another example, to the operation of multiple variables, such as two additions of variables, multiplication or other operations.Certainly, those skilled in the art It is appreciated that mode derived from feature is various, to be performed corresponding processing specifically or based on the needs of business scenario.
For different variables, different means can be used to obtain feature.For example, for text variable, can make Feature is obtained with Capsule Network (capsule or vector neuroid, hereinafter referred to capsule network) algorithm;For LSTM (shot and long term memory network) can be used to obtain feature for sequence variables;Summarize class (velocity) for historical information to become Amount, genetic algorithm and intensified learning can be used to obtain feature;And for variable combination (variable combination), it can Use the combination of FTRL (Follow The Regularized Leader) Lai Jinhang feature.
For text variable, capsule network replaces the single neuron node of traditional neural network with neuron vector, It goes to train this completely new neural network in a manner of dynamic routing;It can be intelligently for part and whole (part- Whole) relationship generates feature, to automatically be generalized to the knowledge acquired in different new scenes.That is, capsule Network introduces new block structure, preferably expresses the stratification relationship between each feature, i.e. it is same to have translation for capsule network It is denaturalized (instead of translation invariance), the relative position between different characteristic or relativeness can be recognized, so that it can It is obtained using less data wider array of extensive.
LSTM network has chain structure as special RNN, can learn to long-term dependence and remember long-term letter Breath, therefore be adapted to obtain the feature of sequence variables.
Summarize class variable for historical information, genetic algorithm and intensified learning can be used to obtain feature.Genetic algorithm and Both intensified learnings all belong to searching method, can be used for all individuals in a kind of groups for object, by selecting, intersecting Effective search is carried out to feature space encoded with the genetic manipulation of variation, is needed to quickly and accurately find adaptation business The candidate feature asked.
The combination that FTRL (Follow The Regularized Leader) carries out feature actually assigns model and catches in real time The ability of changing features on line is caught, thus to break through fixed dimension limitation, realizing that the dynamic increase of feature and deletion lay the foundation.
It will be understood by those skilled in the art that generating or obtaining for the feature of different variables, different methods can be used It carries out, details are not described herein.
Automated characterization selection is carried out generally from the aspect of two for the feature automatically generated and original feature:
Whether feature dissipates: if a feature does not dissipate, such as variance is close to 0, that is to say, that sample is in the spy It there is no difference in sign, there is no what use for the differentiation of sample for this feature.
The correlation of feature and target: the high feature with target correlation should preferably select.
In fact, the two above aspects can by obtain the importance (that is, Feature Importance) of feature come It measures.For example, the feature_importances of Light GBM can divide by the division number of feature or using this feature Gain after splitting is measured.Under normal circumstances, the feature sequence of importance that different weighing criterias obtains can be variant.It can pass through A variety of evaluation criterions come cross selection feature, such as Permutation Importance and K-Fold Feature Importance。
In Permutation Importance method, if a feature is set to random number, modelling effect declines very It is more, illustrate that this feature is important;It is on the contrary then be not.And pass through K-Fold in K-Fold Feature Importance method Cross validation carries out the selection of feature, compares different feature combinations for the prediction effect of model.
The key link that automated characterization selection is related to is character subset search and character subset evaluation.Character subset is searched for Mechanism and character subset evaluation mechanism combine, and feature selection approach can be obtained.Feature selecting can be used for reduce feature quantity, Dimensionality reduction keeps model generalization ability stronger, reduces over-fitting;And enhancing is to the understanding between feature and characteristic value.By automatic Feature selecting can add selection in the feature automatically generated from existing variable pond and be suitble to certain most effective change of scene/risk Measure list.
After completing feature selecting, need to carry out to adjust ginseng (i.e. the automated tuning of parameter) automatically.Parameter is divided into model again Parameter and hyper parameter.Model parameter is the parameter that used model is arrived according to the Distributed learning of training data, is not required to very important person For priori.Hyper parameter is the parameter of the setting value before starting learning process, rather than the parameter obtained by training Data.Under normal conditions, it needs to optimize hyper parameter, selects one group of optimal hyper parameter to model, to improve the property of study Energy and effect.Under normal conditions, common hyper parameter tuning method has: grid search, random search and Bayes optimize.At this It discloses in an embodiment, the automated tuning of parameter is carried out using Bayes's optimization.It will be understood by those skilled in the art that can Selection is using other arameter optimization methods, and details are not described herein.
It is generated as a result, by automated characterization, automated characterization selection and automatic adjust are joined, so that it may construct the new mould for being suitble to new business Type.
Hereinafter with reference to Fig. 5 and Fig. 6 detailed description according to the process of the building new model of one embodiment of the disclosure.
106, via transfer learning training default model and new model.
There are interesting linear positive correlations between amount of training data required for the scale and model of model.In general, The scale of model should be sufficiently large, could sufficiently capture the connection (such as texture and shape in image) of different piece between data With the detailed information (such as quantity of classification) of problem to be solved.Height of the level of model front end commonly used to capture input data Grade connection (such as image border and main body etc.).The level of model rear end helps to make the letter finally determined commonly used to capture Breath (detailed information for usually being used to distinguish target output).Therefore, higher (such as image classification of the complexity of problem to be solved Deng), then the number of parameter and required amount of training data are also bigger.
In most cases, in face of a certain particular problem in a certain field, it is less likely to find adequately training enough Data.But have benefited from transfer learning technology, and from the obtained model of other data sources training, by certain modification and perfect, It can be multiplexed in similar field.Transfer learning can be regarded as defining multiple source domains (source domain) and one A target domain (target domain) learns in source domain, and the knowledge migration that study is arrived is to target Domain promotes the learning effect (or performance) of target domain.
The basic ideas of transfer learning are to have passed through the trained mould of ready-made data set using pre-training model Type.Developer needs to find the level that can export reusable feature in pre-training model, then utilizes the output of the level The smaller neural network of the scale for training those to need parameter less as input feature vector.Before this due to pre-training model The acquistion enterprise schema of data (pattern), therefore the network of the small-scale only needs to ask in learning data for specific The specific connection of topic can.
The advantages of transfer learning is brought is not limited to reduce the scale of training data, can also effectively avoid overfitting (overfit), this is because transfer learning allows model that study is unfolded for different types of data, therefore it is being captured wait solve Performance in terms of the inner link of problem is also just more excellent.
In one embodiment of the disclosure, using multi-task learning, (Multi-task learning is the one of transfer learning Kind) Lai Xunlian default model and new model.It is a little concentrated in individual task due to being generally concerned with, can ignore may help degree of optimization The other information of figureofmerit, such as the training signal from inter-related task.Multi-task learning passes through between shared inter-related task It characterizes (for example, shared data, sharing feature, shared parameter etc.), model can be made preferably to summarize ancestral task.Multitask Study is also a kind of conclusion migration mechanism, improves generalization ability by using shared characterization parallel training multiple tasks.It concludes The method that the knowledge for solving the problems, such as one is applied to relevant issues is absorbed in migration, to improve the efficiency of study.In addition, by When being predicted simultaneously using shared characterization multiple tasks, reduce the quantity of data source and the rule of overall model parameter Mould, therefore keep prediction more efficient.It will be understood by those skilled in the art that may be selected to use other transfer learning methods, This is repeated no more.
As a result, can by existing business, website model capability Rapid transplant to other business and website, thus for new Scene construction model when even if only low volume data and label can also realize quick landing, and model is had and is compared The performance of color.
108, housebroken default model and housebroken new model are merged automatically, to generate Fusion Model.
Via abovementioned steps, different multidimensional characteristic and multiple models have been generated.Integrated study (Ensemble Learning) these features and model can be efficiently utilized to promote the performance of upper line model by Fusion Model.It is actually integrated Study completes learning tasks by building and in conjunction with multiple learners.It has been generally acknowledged that multiple learners are learnt in combination, Study than single learner is much more accurate.To the integrated study device got well, base learner should have certain accurate Property also to have between diversity, that is, learner it is variant exist to guarantee have certain generalization ability.
Common integrated study frame have bagging (parallel type fusion), boosting (string type fusion) and Stacking (stacking-type fusion) frame.In one embodiment of the disclosure, stacking frame is used.Specifically, being exactly Using scoring mode card, first multiple models are given a mark, will then give a mark result branch mailbox, retraining Logic Regression Models, most After do weighting marking.
The multiple models or multiple new established models that can be achieved in basic model library are merged by automatic multi-model as a result, Fast integration, the performance of upper line model is obviously improved with this.
110, housebroken default model is used as model on line, and housebroken new model and Fusion Model are used as Backup model.
112, when one of backup model is better than model on line, with model on backup model substitution line.
After having housebroken default model, housebroken new model and Fusion Model, champion/challenge is just used Person's test or A/B test carry out model on alternative line (i.e. set strategy/champion's strategy) and (choose with one or more alternative models War model).
In champion/challenger's mode, it will usually housebroken default model are used as model on line, because it is to be based on Existing module building;And housebroken new model and Fusion Model are used as backup model.Once it was found that backup model compares line When upper modelling effect is more preferable, backup model can online substitution go offline upper mold type, and model will become backup model on line, be come with this Guarantee that model is constantly in optimum performance on line.
Fig. 2 shows the schematic diagrames for the efficiently method of building risk control model according to one embodiment of the disclosure.
The scheme for efficiently constructing risk control model that the disclosure proposes is realized by air control engine.Air control engine Not only include conventional supervised learning algorithm based on intelligence, efficient risk identification algorithm system, further includes a large amount of based on deep Degree study without prison feature generating algorithm and other supervision and unsupervised concept except algorithm.
Air control engine is by risk perceptions (perceiving center by air control engine to execute), risk identification, intelligent decision (by air control Intelligent engine center executes), Intelligent evolution (executes) system by air control engine center of serve and constructed.Based on air control engine, no It is only capable of carrying out real-time risk scanning to the payment behavior of each user, and excavates and optimize by constantly newly-increased feature of risk The model of algorithm iteration, can the real-time risk resisting of automatic attaching user behavior characteristics progress.Further, air control engine may be used also It migrates system automatically according to transaction flow, risk attack variation, user behavior, adjusts to dynamic and intelligent the control of air control engine Intensity, risk bother rate significant decrease.
For the scheme for efficiently constructing risk control model, actually using constantly newly-increased for constructing Feature of risk is excavated and the knowledge migration formula model construction system of the model of optimization algorithm iteration.Knowledge migration formula model construction body System can perceive center by air control engine, new risk, new site and new business be perceived, thus Quick thread model.And air control Engine center of serve can be provided to the necessary monitoring capacity of the model built, AB power of test and model rollback ability.With This simultaneously, the model built can be supplied to air control intelligent engine center, to carry out risk identification and intelligent decision.
Knowledge migration formula model construction system includes three big modules, is selection (Selection), reproduction respectively (Reproduction) and intersect (Crossover).
Selecting module includes two submodules in basic model library and champion/challenger.
In the submodule of basic model library, it will thus provide build the ability of default model.It is extract from air control system Risk module includes masters, passive side, equipment, environment, behavior, relationship, conflict, mutation and FTG (Fraud to Gross). It can be directed to the different basic model of different module constructions, such as be directed to masters, account value can be built, vulnerable to deceitful Group model etc..It, can be with unrestricted choice basic model library when triggering new business, new site after having basic model library In model, Lai Zidong, which is built, is suitble to business/website default model.
For certain scene, certain business, model on the line directly having an impact to business is in general had;In addition, can also There is backup model not have an impact directly in marking to business simultaneously, this mode is champion/challenger (Champion& Challenger).In this champion/challenger's submodule, when finding that backup model is more preferable than modelling effect on line, backup Model can online substitution go offline upper mold type, and model will become backup model on line, guarantee that model is constantly on line with this Optimum performance.
Recurrent modules mainly include the ability of auto-modeling, including automated characterization generates, automated characterization selects and automatic Adjust ginseng submodule.
In building new model automatically, it can learn or portray automatically different variables by Feature Engineering.
It is automatically to construct candidate feature relevant to goal task based on data set that automated characterization, which generates,.Automated characterization Operation derived from feature is divided into conversion, calculating and polymerization in generation, i.e., original feature is converted, calculated and is polymerize and Generate new candidate feature.
Automated characterization selection can be measured by obtaining the importance of feature.The key link that is related to of automated characterization selection is Character subset search and character subset evaluation.Character subset search mechanisms and character subset evaluation mechanism are combined, can be obtained Feature selection approach.Feature selecting can be used for reducing feature quantity, dimensionality reduction, keep model generalization ability stronger, reduce over-fitting; And enhancing is to the understanding between feature and characteristic value.It is selected by automated characterization, it can be from existing variable pond plus automatic raw At feature in choose be suitble to certain most effective variable list of scene/risk.
After completing feature selecting, need to carry out to adjust ginseng (i.e. the automated tuning of parameter) automatically.Parameter is divided into model again Parameter and hyper parameter.Model parameter is the parameter that used model is arrived according to the Distributed learning of training data, is not required to very important person For priori.Hyper parameter is the parameter of the setting value before starting learning process, rather than the parameter obtained by training Data.Under normal conditions, it needs to optimize hyper parameter, selects one group of optimal hyper parameter to model, to improve the property of study Energy and effect.Under normal conditions, common hyper parameter tuning method has: grid search, random search and Bayes optimize.At this It discloses in an embodiment, the automated tuning of parameter is carried out using Bayes's optimization.It will be understood by those skilled in the art that can Selection is using other arameter optimization methods, and details are not described herein.
Cross module includes that multi-task learning and multi-model merge two submodules.
By multi-task learning submodule, can by existing business, website model capability Rapid transplant to other business and Website is mainly realized by transfer learning.In one embodiment of the disclosure, uses multi-task learning and (belong to transfer learning One kind), by shared data, sharing feature, shared parameter isotype come training pattern.Thus, it is possible to be built in new scene If when model, even if realizing quickly landing, while have model in only low volume data and label Shi Yike development model Compare outstanding performance.
Multi-model fusion submodule can realize being automatically integrating for model, be the sharp weapon of lift scheme performance.In the disclosure one In embodiment, using the mode of scorecard, i.e., it will go to train Logic Regression Models after the appraisal result branch mailbox of multiple models, so Weighting marking is done later.Multiple models or multiple new modelings in basic model library may be implemented by the fusion of automatic multi-model Thus the fast integration of type is obviously improved model performance.
For the scheme for efficiently updating risk control model, used is for updating constantly newly-increased risk The knowledge enhanced model modification system of feature mining and the model of optimization algorithm iteration.Knowledge enhanced model modification system can Center is perceived by air control engine, perceives new risk, new site and new business, thus quickly more new model.And air control engine Center of serve can be provided to the updated necessary monitoring capacity of model, AB power of test and model rollback ability.It is same with this When, updated model can be supplied to air control intelligent engine center, to carry out risk identification and intelligent decision.
Knowledge enhanced model modification system includes three big modules, is self-regulated (Self-tuning), variation respectively (Mutation) and (Adaptation) is adjusted.
Self-regulated module can realize that model is fitted again automatically.The model is fitted meeting enabling when meeting trigger condition again automatically, New training sample is introduced from data warehouse and training sample pond is added, based on to the automatic of the training sample in training sample pond Selection forms different sample sets, and is fitted risk control model again with different sample sets.
Trigger condition includes that the performance for monitoring air control model has decline or unusual fluctuation.Alternatively, trigger condition can be the time Condition, as regular triggered time, such as one week (Week+1) or one day (Day+1).Or it can manually or artificially Triggering is fitted function using model again automatically.
Variation module can realize the automatic retraining of model.In the automatic retraining of model, by the energy of integrated moulding automation Power carrys out oneself of trigger model by perceiving new risk, new business (such as variation, variation of newly-increased event of data distribution etc.) Dynamic retraining, to find optimal models by algorithm, model parameter etc. is changed.
Adaptation module includes online study module.By on-line study, risk form can be perceived based on stream data Often variation, thus carrys out iteratively faster air control model.When every qualitative transaction is entered, on-line study related algorithm can be passed through (such as FTRL, Online Random Forests) goes to update iterative model.
The process of the update of air control model is carried out hereinafter with reference to Fig. 8 using the knowledge enhanced model modification system It is specifically described with 9.
Fig. 3 shows the flow chart of the process 300 for building default model according to one embodiment of the disclosure.
302, risk module is refined for each scene.The risk module refined may include masters, passive side, set Standby, environment, behavior, relationship, conflict, mutation and FTG (Fraud to Gross) etc..These risk modules can be used as variable Portrayed, be divided into: historical information summarizes class variable (velocity class);Derivative class variable, including idiovariation and group it is general Rate;And relationship class variable etc..
304, basic model is constructed for each risk module, and construct basic model library based on these basic models.
Different basic models can be built for each risk module or variable, to construct basic model library.For example, being directed to Masters in identity variable can construct account maturity, information leakage people based on user's gray list, transaction history information etc. The basic models such as group, easily stolen crowd, sense of security crowd.It, can acts and efforts for expediency based on account and the long-term row of history for behavior For basic models such as building account operation behavior, verification interbehavior, scene displacement behavior, fund flow behaviors.For equipment, Abnormal beaching accommodation, abnormal operation device, exception can be constructed and distort the basic models such as equipment, operation wooden horse equipment.Similarly, needle To address, it can construct and log in address extremely, abnormal operation address, distort the basic models such as address, address dummy extremely.For pass System can construct and other side's relationship and scene relationship and the basic models such as content relation and positional relationship.Those skilled in the art It is appreciated that being directed to different risk module or variable, different basic models can be constructed according to its different type, herein It will not be described in great detail.
306, when triggering new business, the correspondence basic model in basic model library is chosen.When triggering new business or newly When website, the business/stand can be suitble in order to build automatically based on the model in the unrestricted choice basic model library of basic model library The default model of point.
308, the default model of suitable new business is built using corresponding basic model.In the process for building default model In, related is actually that multiple variables merge modeling.It will be understood by those skilled in the art that for different business or station Point can choose the different models in basic model library, to carry out the merging modeling of different variables.
Fig. 4 shows the schematic diagram of the process for building default model according to another embodiment of the disclosure.
During risk control, numerous risk modules can be extracted, including masters, passive side, equipment, environment, Behavior, relationship, conflict, mutation and FTG (Fraud to Gross) etc..These risk modules are carried out effectively as variable Portray, can be divided into: historical information summarizes class variable (velocity class);Derivative class variable, including idiovariation and group's probability; And relationship class variable etc..
For each risk module or variable, different basic models can be built, to construct basic model library.Citing and Speech can construct account maturity, information leakage crowd, easily stolen crowd, sense of security crowd for the masters in identity variable Equal basic models.For behavior, account operation behavior, verification interbehavior, scene displacement behavior, fund flow behavior can be constructed Equal basic models.For equipment, abnormal beaching accommodation, abnormal operation device, exception can be constructed and distort equipment, operation wooden horse equipment Equal basic models.Similarly, for address, it can construct and log in address extremely, abnormal operation address, distort address, falseness extremely The basic models such as address.For relationship, can construct and other side's relationship and scene relationship and the bases such as content relation and positional relationship Plinth model.
When triggering new business or when new site, can based on the model in the unrestricted choice basic model library of basic model library, with It is suitble to business/website default model convenient for building automatically.
Fig. 5 shows the flow chart of the process 500 for constructing new model automatically according to one embodiment of the disclosure.
502, original variable pond is obtained.
To transfer accounts to for account trading, two main bodys being related to are account of payment and collecting account.In air control event, In addition to account trading behavior further includes the information such as operation behavior and the log of account, wherein behavior of the account of payment as masters Including paying, changing close, plusing good friend, change head portrait etc., collecting account includes collecting money, reported, being added as the behavior of passive side Good friend etc..That is, being directed to a money transfer transactions, original variable is mainly the behavior of account of payment and the behavior of collecting account.
Again to transfer accounts to for card scene, existing variable is group's variable and FTG variable.It is transferring accounts at present to card scene The FTG variable portrayed is the dimensions such as city, age and card bin (Bank Identification Number).
It will be understood by those skilled in the art that different original variables can be obtained in different scenes to form original change Measure pond.
504, different types of feature is automatically generated based on original variable in original variable pond.
To transfer accounts to for account trading, can behavior and collecting account based on account of payment behavior carry out sequential mining Analysis, and acts and efforts for expediency for account and history long-term action carry out the excavations of different length time windows, and identification account is different Normal behavior sequence, to promote fraud prevention and control.
For example, multiple behavior sequences can be constructed, such as, the real-time event sequence of account of payment, real-time RPC sequence, Historical events sequence etc.;The real-time event sequence of collecting account, real-time RPC sequence, historical events sequence etc..Further, base It, can be for example using the real time sequence of account of payment and collecting account as masters sequence and passive side's sequence in these behavior sequences Merge into a vector.
To transfer accounts to for card scene, difficult point is the prevention and control of neocaine.For further prevention and control neocaine risk, from group And the thinking of FTG variable is set out, and the insertion (embedding) of card dimension is generated using deep learning Series Modeling, it then will card The insertion of dimension is aggregated into card bin dimension again, and the behavioural information of card bin has been refined in the insertion summarized, comes hence for neocaine As long as saying that it blocks bin and occurred obtaining its card bin behavioural characteristic.In this scene, embedding is by the sparse of feature Matrix becomes dense matrix, to reach the purpose for generating different type feature and dimensionality reduction.
It will be understood by those skilled in the art that different methods can be used to be based on original variable certainly in different scenes It is dynamic to generate different types of feature.
506, the variable for being suitble to scene is selected, from original variable pond and the feature automatically generated to generate variable column Table.
Feature in original variable pond and the feature automatically generated are combinable at a variable pond, then exist for special scenes Wherein select the variable list matched.
Feature selecting can be measured by obtaining the importance of feature.The key link that automated characterization selection is related to is feature Subset search and character subset evaluation.Character subset search mechanisms and character subset evaluation mechanism are combined, feature can be obtained Selection method.Selected by automated characterization, can be added from existing variable pond in the feature that automatically generates choose be suitble to certain scene/ The most effective variable list of risk.
508, automatic adjust is carried out for variable list and is joined.
After having chosen variable list, need to carry out to adjust ginseng (i.e. the automated tuning of parameter) automatically.Parameter is divided into mould again Shape parameter and hyper parameter.Model parameter is the parameter that used model is arrived according to the Distributed learning of training data, is not needed Artificial priori.Hyper parameter is the parameter of the setting value before starting learning process, rather than the ginseng obtained by training Number data.Under normal conditions, it needs to optimize hyper parameter, selects one group of optimal hyper parameter to model, to improve study Performance and effect.Under normal conditions, common hyper parameter tuning method has: grid search, random search and Bayes optimize.? In one embodiment of the disclosure, the automated tuning of parameter is carried out using Bayes's optimization.It will be understood by those skilled in the art that It may be selected using other arameter optimization methods, details are not described herein.
510, the new model for being suitble to scene is obtained.
It is generated as a result, by feature, feature selecting and automatic adjust are joined, so that it may obtain the new model for being suitble to scene.
Fig. 6 shows the schematic diagram of the process for constructing new model automatically according to another embodiment of the disclosure.
Since the input for receiving legacy data, which includes event and label for the building of new model.Event and Label can correspond to different variable/features, these variables/be characterized in original variable/feature.
The building of new model includes that feature generates, feature selecting and automatic adjust are joined.
Feature, which is generated, constitutes inhomogeneity another characteristic, such as event attribute based on original variable pond and the feature automatically generated Feature (property), event accumulate feature (velocity), sequence of events feature (sequence), relationship topology feature (graph), text representation feature (text info), variable assemblage characteristic (variable combination) etc..
Based on these inhomogeneity another characteristics, the variable matched can be wherein being selected for special scenes, that is, carried out special Sign selection.Feature selecting can be measured by obtaining the importance of feature.The key link that automated characterization selection is related to is feature Subset search and character subset evaluation.Character subset search mechanisms and character subset evaluation mechanism are combined, feature can be obtained Selection method.Selected by automated characterization, can be added from existing variable pond in the feature that automatically generates choose be suitble to certain scene/ The most effective variable list of risk.
After having chosen variable list, need to carry out to adjust ginseng automatically.Under normal conditions, common arameter optimization method Have: grid search, random search and Bayes optimize.It will be understood by those skilled in the art that special parameter tune can be used as needed Excellent method, details are not described herein.
It is generated by feature, feature selecting and automatic adjust are joined, so that it may export the new model for being suitble to scene.
The present disclosure proposes a kind of for efficiently constructing the scheme of risk control model.For current in the art newly-built Model flow is complicated, and data cleansing therein, model training, model deployment require the problem of expending a large amount of manpower, this public affairs The technical solution opened is built based on basic model, automodel building and Fusion Model generate, and passes through model on line and backup mould The competition in real-time of type efficiently constructs the model excavated with constantly newly-increased feature of risk with optimization algorithm iteration.Therefore, The technical solution of the disclosure provides not only general technological frame and solution, additionally provides adaptation different business and develops rank The model capability of section.
System for efficiently constructing risk control model
Fig. 7 shows the block diagram of the system 700 for efficiently constructing risk control model according to one embodiment of the disclosure.
System 700 includes that default model builds module 702, new model building module 704, model training module 706, fusion Model generation module 708 and optimal models selecting module 710.
Default model builds module 702 and constructs basic model library to select in the basic model library when triggering new business Model, to build default model.
During risk control, default model, which builds module 702, can extract numerous risk modules, including actively Side, passive side, equipment, environment, behavior, relationship, conflict, mutation and FTG (Fraud to Gross) etc..These risk modules Portrayed, can be divided into effectively as variable: historical information summarizes class variable (velocity class);Derivative class variable, packet Include idiovariation and group's probability;And relationship class variable etc..
For each risk module or variable, default model, which builds module 702, can build different basic models, thus structure Build basic model library.It for example, can be based on user's gray list, transaction history information etc. for the masters in identity variable Construct the basic models such as account maturity, information leakage crowd, easily stolen crowd, sense of security crowd.For behavior, account can be based on The acts and efforts for expediency at family and history long-term action building account operation behavior, verification interbehavior, scene displacement behavior, fund flow The basic models such as behavior.For equipment, abnormal beaching accommodation, abnormal operation device, exception can be constructed and distort equipment, operation wooden horse The basic models such as equipment.Similarly, for address, can construct it is abnormal log in address, abnormal operation address, it is abnormal distort address, The basic models such as address dummy.For relationship, can construct and other side's relationship and scene relationship and content relation and positional relationship Equal basic models.It will be understood by those skilled in the art that it is directed to different risk module or variable, it can be according to its different type To construct different basic models.
When triggering new business or new site, default model, which builds module 702, can be based on basic model library unrestricted choice base Model in plinth model library, Lai Zidong, which is built, is suitble to business/website default model.During building default model, Related is actually that multiple variables merge modeling.It will be understood by those skilled in the art that for different business or website, The different models in basic model library can be chosen, to carry out the merging modeling of different variables.
It is suitable new to construct by automated characterization generation, automated characterization selection and automatic tune ginseng that new model constructs module 704 The new model of business.
In building new model automatically, new model, which constructs module 704, can be based on legacy data by Feature Engineering come automatic Learn or portray different variables.Automated characterization generation is relevant to goal task candidate special automatically to construct based on data set Time and relational dataset, are usually converted to the eigenmatrix that can be used for machine learning by sign.
Operation derived from feature is divided into conversion, calculating and polymerization in automated characterization generation, i.e., turns to original feature It changes, calculate and polymerize and generate new candidate feature.Certainly, it will be understood by those skilled in the art that mode derived from feature It is various, it to be performed corresponding processing specifically or based on the needs of business scenario.
For different variables, different means are can be used to obtain feature in new model building module 704.For example, For text variable, Capsule Network (capsule or vector neuroid, hereinafter referred to capsule network) can be used to calculate Method obtains feature;For sequence variables, LSTM (shot and long term memory network) can be used to obtain feature;It is converged for historical information Total class (velocity) variable, genetic algorithm and intensified learning can be used to obtain feature;And (variable is combined for variable Combination), the combination of FTRL (Follow The Regularized Leader) Lai Jinhang feature can be used.This field Technical staff is appreciated that the feature for different variables is generated or obtained, and different methods can be used to carry out.
New model building module 704 is directed to the feature automatically generated and original feature carries out automated characterization selection.This Importance (that is, Feature Importance) Lai Jinhang of acquisition feature can be passed through.Automated characterization selects the crucial ring being related to Section is character subset search and character subset evaluation.Character subset search mechanisms and character subset evaluation mechanism are combined, it can Obtain feature selection approach.Feature selecting can be used for reducing feature quantity, dimensionality reduction, keeps model generalization ability stronger, reduced Fitting;And enhancing is to the understanding between feature and characteristic value.It is selected, can be added certainly from existing variable pond by automated characterization It is chosen in the dynamic feature generated and is suitble to certain most effective variable list of scene/risk.
After completing feature selecting, new model constructs module 704 and needs to carry out automatic tune ginseng (the i.e. automatic tune of parameter It is excellent).Under normal conditions, common arameter optimization method has: grid search, random search and Bayes optimize.Art technology Personnel are appreciated that can on-demand selection parameter tuning method.
Joined as a result, by automated characterization generation, automated characterization selection and automatic adjust, new model building module 704 can structure Build the new model of suitable new business.
Model training module 706 is via transfer learning training default model and new model.
In most cases, in face of a certain particular problem in a certain field, it is less likely to find adequately training enough Data.But have benefited from transfer learning technology, and from the obtained model of other data sources training, by certain modification and perfect, It can be multiplexed in similar field.Transfer learning is to define multiple source domains (source domain) and a target Field (target domain) learns in source domain, and the knowledge migration that study is arrived is to target domain, Promote the learning effect (or performance) of target domain.
In one embodiment of the disclosure, the use of model training module 706 multi-task learning (Multi-task learning, For one kind of transfer learning) Lai Xunlian default model and new model.It will be understood by those skilled in the art that may be selected to use other Transfer learning method.
As a result, model training module 706 can by existing business, website model capability Rapid transplant to other business and station Point to can also realize quick landing even if only low volume data and label when being directed to new scene construction model, and makes It obtains model and has the outstanding performance of comparison.
Fusion Model generation module 708 merges housebroken default model and housebroken new model automatically, is melted with generating Molding type.
Fusion Model generation module 708 can be by integrated study (Ensemble Learning) come Fusion Model, with efficient The performance of upper line model is promoted using multiple features and multiple models.Fusion Model generation module 708 passes through automatically more as a result, Model Fusion can realize the fast integration of multiple models or multiple new established models in basic model library, be obviously improved with this The performance of upper line model.
Housebroken default model is used as model on line by optimal models selecting module 710, and by housebroken new model It is used as backup model with Fusion Model.When one of backup model is better than model on line, optimal models selecting module 710 is standby with this Model on part model substitution line.
After having housebroken default model, housebroken new model and Fusion Model, optimal models select mould Block 710 is just using champion/challenger's test or A/B test come model on alternative line (i.e. set strategy/champion's strategy) and one A or multiple alternative models (i.e. challenge model).
In champion/challenger's mode, housebroken default model would generally be used as line by optimal models selecting module 710 Upper model, because it is constructed based on existing module;And housebroken new model and Fusion Model are used as backup model.One When denier finds that backup model is more preferable than modelling effect on line, backup model online can be substituted and be gone offline upper mold type, and model meeting on line Become backup model, guarantees that model is constantly in optimum performance on line with this.
As a result, for efficiently constructing the exportable optimal dynamic model of system 700 of risk control model.
The present disclosure proposes a kind of for efficiently constructing the scheme of risk control model.For current in the art newly-built Model flow is complicated, and data cleansing therein, model training, model deployment require the problem of expending a large amount of manpower, this public affairs The technical solution opened is built based on basic model, automodel building and Fusion Model generate, and passes through model on line and backup mould The competition in real-time of type efficiently constructs the model excavated with constantly newly-increased feature of risk with optimization algorithm iteration.Therefore, The technical solution of the disclosure provides not only general technological frame and solution, additionally provides adaptation different business and develops rank The model capability of section.
Method for efficiently updating risk control model
Fig. 8 shows the process of the method 800 for efficiently updating risk control model according to one embodiment of the disclosure Figure.
802, the variation of the performance change and input data of risk control model is monitored.
It includes the performance decline or unusual fluctuation of risk control model that risk control model, which has performance change,.In the another of the disclosure In embodiment, monitoring triggering can be periodically triggering (such as Week+1, Day+1 etc.).In the another embodiment of the disclosure, prison Control triggering can also be manual triggering.These triggering modes are dependent on the automation of basic data, i.e. sample label and variable number According to can prepare automatically and timing update.For example, can be inside the table in different bottom data warehouses by label and variable number Come according to selecting.
The input data of risk control model changes the distribution including input data and changes and have newly-increased event to become Change.The distribution of input data, which changes, can lead to increasing or decreasing for variable in model, this can be realized by Feature Engineering, i.e., As previously mentioned, learning or portraying new different variables automatically by Feature Engineering, and to original variable and new variable Feature Selection is carried out, suitable model structure parameter is thus obtained.And the black and white label for thering is newly-increased event change to can lead to sample Variation, thus cause the hyper parameter of model that may need to make adjustment.
804, when risk control model has performance change, then risk control model is fitted to obtain the wind through being fitted again Dangerous Controlling model.
When risk control model has performance change, when especially performance has decline or unusual fluctuation, it usually needs on assessment line The performance of model and several candidate families simultaneously therefrom selects optimization model.Alternative candidate family can be different super The same class model of parameter.
When model performance has decline or unusual fluctuation, the problem of generally occurring within is: one kind is poor fitting, i.e., high deviation (high Bias), model does not train the feature of data set, causes precision of the model on training set, test set all very low,;It is another kind of It is model over-fitting, i.e., high variance (high variance), it includes all features including noise that model training, which goes out, leads to mould Type is very high in the precision of training set, but when being applied to new data set, precision is very low.At this point, just it is contemplated that model complexity and Data set size is fitted air control model again.Details are not described herein for the selection of model complexity, reference can be made to building automatically above Mold process.
The size of data set influences the superiority and inferiority of model performance quite deep.For over-fitting, due to model training It can be with noise attenuation weight by obtaining more data samples comprising all features including noise.For poor fitting Speech increases the feature that training data can enable model train data set.
Thus, it is contemplated that introducing new sample from data warehouse and sample pool being added.Based on to sample in sample pool from Dynamic selection, forms different sample sets.Then, it is fitted risk control model again with different sample sets.In this way, the general of model can be improved Change ability.
806, when the input data of risk control model changes, retraining risk control model is instructed again with obtaining Experienced risk control model.
When the input data of risk control model changes, the distribution of input data, which changes, can lead to variable in model Increase or decrease, this can be realized by Feature Engineering, i.e., learn or portray new different become automatically by Feature Engineering Amount, and Feature Selection is carried out to original variable and new variable, thus obtain suitable model structure parameter.And have newly-increased Event change can lead to the variation of the black and white label of sample, thus cause the hyper parameter of model that may need to make adjustment.
Retraining risk control model further comprises: adjusting the structural parameters of risk control model;And adjustment risk The hyper parameter of Controlling model.
The structural parameters of adjustment risk control model further comprise: the variation based on data automatically generates new feature; Feature Selection is carried out to the feature of risk control model;And the structure of risk control model is adjusted using the feature filtered out Parameter.The structural parameters of adjustment risk control model can substantially be realized by automatic modeling process.
The hyper parameter for adjusting risk control model is carried out using one of grid search, random search and Bayes's optimization.
808, risk control model through being fitted again updated by incremental learning with stream data or through the wind of retraining Dangerous Controlling model.
Risk control model through being fitted again updated by incremental learning with stream data or through the risk control of retraining Model using such as FTRL algorithm and online random forest (Online Random Forest) algorithm on-line learning algorithm into Row.
On-line learning algorithm belongs to incremental learning, it is emphasised that the real-time of training.When towards stream data, instruct every time Practice and does not use full dose data, but based on trained parameter before, model of a Sample Refreshment is utilized every time, from And quick more new model, the timeliness for improving model.
On-line study pursuit designs optimal strategy to known all knowledge, then with the difference of this optimal strategy (regret) is regretted away from becoming: regretting not selecting this strategy from the beginning.Desirably, with the increasing of time Add, which can be continuously getting smaller and smaller.Therefore, what on-line study was pursued is not regret (no-regret).
By online/incremental learning, the frequent variation that risk form can be perceived based on stream data (is presented as streaming number According to), thus carry out iteratively faster air control model.
810, risk control model is used as model on line, and by the updated risk control model through being fitted again and Risk control model through retraining is used as backup model.
812, when one of backup model is better than model on line, with model on backup model substitution line.
The performance test of model can pass through evaluation index (such as AUC, F1, KS) Lai Jinhang of comparison model.
Equally, the comparison of model and backup model and substitution can pass through champion/challenger as described above on line (Champion&Challenger) mode carries out.In champion as described above/challenger's submodule, once discovery backup mould When type is more preferable than modelling effect on line, backup model online can substitute the upper mold type that goes offline, and model will become backup model on line, Guarantee that model is constantly in optimum performance on line with this.
Fig. 9 shows the schematic diagram of the method for efficiently updating risk control model according to one embodiment of the disclosure. In fig. 9 it is shown that include model be fitted again automatically, the schematic diagram of the automatic retraining of model and incremental learning.
In model is fitted again automatically, triggering includes performance monitoring perception and model operation triggering.As previously mentioned, monitoring It is the performance decline or unusual fluctuation of risk control model.Alternatively, monitoring triggering can be periodically triggering (such as Week+1, Day+1 Deng).Or monitoring triggering can also be manual triggering.These triggering modes are dependent on the automation of basic data, i.e. sample Label and variable data can prepare automatically and timing updates.
Model is fitted the main automatic Fitting automatically selected with model including sample again.Sample automatically select including from Data warehouse introduces new sample and sample pool is added, and based on forming different samples to automatically selecting for the sample in sample pool Collection.The automatic Fitting of model by being fitted risk control model with different sample sets again.The extensive energy of model can be improved in this way Power.
Model evaluation is then carried out, i.e., the performance of model is compared automatically, then carry out model scheme selection.Model side After case is selected, model can be online.In this embodiment, selected model is disposed by hand, and strategy does not need to carry out Adjustment.
In the automatic retraining of model, due to the triggering of new risk or new business, it will learned automatically by Feature Engineering New different variables are practised or portrayed, and Feature Selection is carried out to original variable and new variable, thus adjust the knot of model Structure parameter.And there is the variation of the black and white label of sample caused by newly-increased event change that the hyper parameter of model may make to make tune It is whole.It can be realized by automatic modeling process on this process nature.
In incremental learning, qualitative transaction will trigger the incremental learning.Knowledge based library and stream data are trained not every time Using full dose data, but based on trained parameter before, model of a Sample Refreshment is utilized every time, thus fastly Fast more new model, the timeliness for improving model.
For efficiently updating the system of risk control model
Figure 10 shows the frame of the system 1000 for efficiently updating risk control model according to one embodiment of the disclosure Figure.
System 1000 includes monitoring module 1002, model fitting module 1004, model retraining module 1006, increment again Practise module 1008 and optimal models selecting module 1010.
Monitoring module 1002 monitors the variation of the performance change and input data of risk control model.Risk control model has Performance change includes the performance decline or unusual fluctuation of risk control model.The input data of risk control model is changed including input The distribution of data changes and has newly-increased event change.
Fitting module 1004 is fitted risk control model again when risk control model has performance change to obtain to model again Risk control model through being fitted again.
Fitting module 1004 is fitted risk control model again and further comprises model again: new sample is introduced from data warehouse And sample pool is added;Based on automatically selecting to the sample in sample pool, different sample sets are formed;It is fitted again with different sample sets Risk control model.
The retraining risk control model when the input data of risk control model changes of model retraining module 1006 To obtain the risk control model through retraining.
1006 retraining risk control model of model retraining module further comprises: adjusting the structure of risk control model Parameter;And the hyper parameter of adjustment risk control model.
The structural parameters that model retraining module 1006 adjusts risk control model further comprise: the variation based on data Automatically generate new feature;Feature Selection is carried out to the feature of risk control model;And it is adjusted using the feature filtered out The structural parameters of risk control model.
The hyper parameter that model retraining module 1006 adjusts risk control model uses grid search, random search and shellfish One of Ye Si optimization carries out.
Incremental learning module 1008 updates risk control model or warp through being fitted again by incremental learning with stream data The risk control model of retraining.
Incremental learning module 1008 updates risk control model or warp through being fitted again by incremental learning with stream data The risk control model of retraining is carried out using FTRL algorithm and online random forest (Online Random Forest) algorithm.
Risk control model is used as model on line by optimal models selecting module 1010, and is fitted updated again Risk control model and risk control model through retraining are used as backup model, and when one of backup model is better than line upper mold When type, with model on backup model substitution line.
The present disclosure proposes a kind of for efficiently updating the scheme of risk control model.The program can become for the moment The quick update of the risk implementation model of change and iteration, greatly increase the adaptive ability of model, to promote the prevention and control to risk Ability.Simultaneously, (refit), automatic retraining (retrain), on-line study (online learning) are fitted again automatically Deng will greatly shorten the period of model training and deployment, the efficiency of lift scheme exploitation.Therefore, the technical solution of the disclosure is not only General technological frame and solution are provided, the model capability of adaptation different business developing stage is additionally provided.
It is described above to be used for the efficiently method and system of building risk control model and for efficiently more fresh air The each step and module of the method and system of dangerous Controlling model can with hardware, software, or combinations thereof realize.If hard It is realized in part, general processor, digital signal can be used in conjunction with various illustrative steps, module and the circuit that the present invention describes It is processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic components, hard Part component, or any combination thereof realize or execute.General processor can be processor, microprocessor, controller, microcontroller Device or state machine etc..If realized in software, can be used as in conjunction with various illustrative steps, the module that the present invention describes One or more instruction or code may be stored on the computer-readable medium or be transmitted.Realize the soft of various operations of the invention Part module can reside in storage medium, such as RAM, flash memory, ROM, EPROM, EEPROM, register, hard disk, removable disk, CD- ROM, cloud storage etc..Storage medium can be coupled to processor so that the processor can from/to the storage medium reading writing information, and Corresponding program module is executed to realize each step of the invention.Moreover, software-based embodiment can be by appropriate Means of communication is uploaded, downloads or remotely accesses.This means of communication appropriate includes such as internet, WWW, inline Net, software application, cable (including fiber optic cables), magnetic communication, electromagnetic communication (including RF, microwave and infrared communication), electronics are logical Letter or other such means of communication.
It shall yet further be noted that these embodiments are probably as the process for being depicted as flow chart, flow graph, structure chart or block diagram Come what is described.Although all operations may be described as sequential process by flow chart, many of these operations operation can It executes parallel or concurrently.In addition, the order of these operations can be rearranged.
Disclosed methods, devices and systems should not be limited in any way.On the contrary, the present invention cover it is various disclosed Embodiment (individually and various combinations with one another and sub-portfolio) all novel and non-obvious feature and aspects.Institute is public The methods, devices and systems opened are not limited to any specific aspect or feature or their combination, disclosed any embodiment It does not require the existence of any one or more specific advantages or solves specific or all technical problems.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Change, these are within the scope of the present invention.

Claims (15)

1. a kind of method for efficiently constructing risk control model, comprising:
Basic model library is constructed to select the model in the basic model library when triggering new business, to build default model;
It is generated by automated characterization, automated characterization selection and automatic adjust are joined to construct the new model for being suitble to the new business;
Via the transfer learning training default model and the new model;
Automatically the housebroken default model and the housebroken new model are merged, to generate Fusion Model;
The housebroken default model is used as model on line, and the housebroken new model and the Fusion Model are used Make backup model;And
When one of described backup model is better than model on the line, model on the line is substituted with the backup model.
2. the method as described in claim 1, which is characterized in that building default model further comprises:
Risk module is refined for each scene;
Basic model is constructed for each risk module, and basic model library is constructed based on the basic model;
When triggering new business, the correspondence basic model in the basic model library is chosen;And
The default model of suitable new business is built using corresponding basic model.
3. method according to claim 2, which is characterized in that the risk module refined may include masters, passive side, set Standby, environment, behavior, relationship, conflict, mutation and FTG (Fraud to Gross).
4. the method as described in claim 1, which is characterized in that building new model further comprises:
Obtain original variable pond;
Different types of feature is automatically generated based on original variable in original variable pond;
The variable for being suitble to scene is selected, from original variable pond and the feature automatically generated to generate variable list;
Automatic adjust is carried out for the variable list to join;And
Obtain the new model for being suitble to scene.
5. the method as described in claim 1, which is characterized in that the automated characterization generation includes turning to original feature It changes, calculate and polymerize and generate new candidate feature.
6. the method as described in claim 1, which is characterized in that the automated characterization selection is related to character subset search and feature Subset evaluation.
7. the method as described in claim 1, which is characterized in that the automatic tune ginseng using grid search, random search and One of Bayes's optimization.
8. a kind of system for efficiently constructing risk control model, comprising:
Default model builds module, constructs basic model library to select the mould in the basic model library when triggering new business Type, to build default model;
New model constructs module, generated by automated characterization, automated characterization selection and it is automatic adjust ginseng construct be suitble to it is described new The new model of business;
Model training module, via the transfer learning training default model and the new model;
Fusion Model generation module merges the housebroken default model and the housebroken new model, automatically with life At Fusion Model;
The housebroken default model is used as model on line by optimal models selecting module, and will be housebroken described new Model and the Fusion Model are used as backup model, and standby with this when one of described backup model is better than model on the line Part model substitutes model on the line.
9. system as claimed in claim 8, which is characterized in that the default model build module further,
Risk module is refined for each scene;
Basic model is constructed for each risk module, and basic model library is constructed based on the basic model;
When triggering new business, the correspondence basic model in the basic model library is chosen;And
The default model of suitable new business is built using corresponding basic model.
10. system as claimed in claim 9, which is characterized in that the risk module refined may include masters, passive side, Equipment, environment, behavior, relationship, conflict, mutation and FTG (Fraud to Gross).
11. system as claimed in claim 8, which is characterized in that new model building module further,
Obtain original variable pond;
Different types of feature is automatically generated based on original variable in original variable pond;
The variable for being suitble to scene is selected, from original variable pond and the feature automatically generated to generate variable list;
Automatic adjust is carried out for the variable list to join;And
Obtain the new model for being suitble to scene.
12. system as claimed in claim 8, which is characterized in that the new model building module carries out automated characterization and generates packet It includes and original feature is converted, calculated and polymerize and generates new candidate feature.
13. system as claimed in claim 8, which is characterized in that the new model building module carries out automated characterization selection and relates to And character subset search and character subset are evaluated.
14. system as claimed in claim 8, which is characterized in that the new model building module carries out automatic tune ginseng and uses net One of lattice search, random search and Bayes's optimization.
15. a kind of computer readable storage medium for being stored with instruction executes machine as weighed Benefit requires method described in any one of 1-8.
CN201910587071.9A 2019-07-01 2019-07-01 Method and system for constructing risk control model Active CN110334814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910587071.9A CN110334814B (en) 2019-07-01 2019-07-01 Method and system for constructing risk control model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910587071.9A CN110334814B (en) 2019-07-01 2019-07-01 Method and system for constructing risk control model

Publications (2)

Publication Number Publication Date
CN110334814A true CN110334814A (en) 2019-10-15
CN110334814B CN110334814B (en) 2023-05-02

Family

ID=68143032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910587071.9A Active CN110334814B (en) 2019-07-01 2019-07-01 Method and system for constructing risk control model

Country Status (1)

Country Link
CN (1) CN110334814B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659492A (en) * 2019-09-24 2020-01-07 北京信息科技大学 Multi-agent reinforcement learning-based malicious software detection method and device
CN110766167A (en) * 2019-10-29 2020-02-07 深圳前海微众银行股份有限公司 Interactive feature selection method, device and readable storage medium
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology
CN111047423A (en) * 2019-11-01 2020-04-21 支付宝(杭州)信息技术有限公司 Risk determination method and device and electronic equipment
CN111127197A (en) * 2019-12-31 2020-05-08 南京币鑫数据科技有限公司 Foreign trade supply chain financial risk control method
CN111310454A (en) * 2020-01-17 2020-06-19 北京邮电大学 Relation extraction method and device based on domain migration
CN111967600A (en) * 2020-08-18 2020-11-20 北京睿知图远科技有限公司 Feature derivation system and method based on genetic algorithm in wind control scene
CN112149119A (en) * 2020-09-27 2020-12-29 苏州遐视智能科技有限公司 Dynamic active security defense method and system for artificial intelligence system and storage medium
CN113326113A (en) * 2021-05-25 2021-08-31 北京市商汤科技开发有限公司 Task processing method and device, electronic equipment and storage medium
CN113743111A (en) * 2020-08-25 2021-12-03 国家计算机网络与信息安全管理中心 Financial risk prediction method and device based on text pre-training and multi-task learning
CN114372414A (en) * 2022-01-06 2022-04-19 腾讯科技(深圳)有限公司 Multi-modal model construction method and device and computer equipment
CN114579712A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN115099793A (en) * 2022-08-25 2022-09-23 中国电子科技集团公司第十五研究所 Dynamic model assembling method, server and storage medium for task scene
CN116542511A (en) * 2022-02-08 2023-08-04 百融云创科技股份有限公司 Wind control model creation method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016057001A1 (en) * 2014-10-09 2016-04-14 Cloudradigm Pte. Ltd. A computer implemented method and system for automatically modelling a problem and orchestrating candidate algorithms to solve the problem
CN108154430A (en) * 2017-12-28 2018-06-12 上海氪信信息技术有限公司 A kind of credit scoring construction method based on machine learning and big data technology
CN109598281A (en) * 2018-10-11 2019-04-09 阿里巴巴集团控股有限公司 A kind of business risk preventing control method, device and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016057001A1 (en) * 2014-10-09 2016-04-14 Cloudradigm Pte. Ltd. A computer implemented method and system for automatically modelling a problem and orchestrating candidate algorithms to solve the problem
CN108154430A (en) * 2017-12-28 2018-06-12 上海氪信信息技术有限公司 A kind of credit scoring construction method based on machine learning and big data technology
CN109598281A (en) * 2018-10-11 2019-04-09 阿里巴巴集团控股有限公司 A kind of business risk preventing control method, device and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
史荧中等: "迁移学习支持向量回归机", 《计算机应用》 *
王平等: "一种基于增量式SVR学习的在线自适应建模方法", 《化工学报》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659492A (en) * 2019-09-24 2020-01-07 北京信息科技大学 Multi-agent reinforcement learning-based malicious software detection method and device
CN110766167B (en) * 2019-10-29 2021-08-06 深圳前海微众银行股份有限公司 Interactive feature selection method, device and readable storage medium
CN110766167A (en) * 2019-10-29 2020-02-07 深圳前海微众银行股份有限公司 Interactive feature selection method, device and readable storage medium
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology
CN111047423A (en) * 2019-11-01 2020-04-21 支付宝(杭州)信息技术有限公司 Risk determination method and device and electronic equipment
CN111127197A (en) * 2019-12-31 2020-05-08 南京币鑫数据科技有限公司 Foreign trade supply chain financial risk control method
CN111310454A (en) * 2020-01-17 2020-06-19 北京邮电大学 Relation extraction method and device based on domain migration
CN111310454B (en) * 2020-01-17 2022-01-07 北京邮电大学 Relation extraction method and device based on domain migration
CN111967600A (en) * 2020-08-18 2020-11-20 北京睿知图远科技有限公司 Feature derivation system and method based on genetic algorithm in wind control scene
CN113743111A (en) * 2020-08-25 2021-12-03 国家计算机网络与信息安全管理中心 Financial risk prediction method and device based on text pre-training and multi-task learning
CN112149119A (en) * 2020-09-27 2020-12-29 苏州遐视智能科技有限公司 Dynamic active security defense method and system for artificial intelligence system and storage medium
CN113326113A (en) * 2021-05-25 2021-08-31 北京市商汤科技开发有限公司 Task processing method and device, electronic equipment and storage medium
CN114372414A (en) * 2022-01-06 2022-04-19 腾讯科技(深圳)有限公司 Multi-modal model construction method and device and computer equipment
CN116542511A (en) * 2022-02-08 2023-08-04 百融云创科技股份有限公司 Wind control model creation method and device, electronic equipment and storage medium
CN114579712A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN114579712B (en) * 2022-05-05 2022-07-15 中科雨辰科技有限公司 Text attribute extraction and matching method based on dynamic model
CN115099793A (en) * 2022-08-25 2022-09-23 中国电子科技集团公司第十五研究所 Dynamic model assembling method, server and storage medium for task scene
CN115099793B (en) * 2022-08-25 2022-11-18 中国电子科技集团公司第十五研究所 Dynamic model assembling method, server and storage medium for task scene

Also Published As

Publication number Publication date
CN110334814B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN110334814A (en) For constructing the method and system of risk control model
CN110310206A (en) For updating the method and system of risk control model
Zhang et al. A return-cost-based binary firefly algorithm for feature selection
Ahmadianfar et al. RUN beyond the metaphor: An efficient optimization algorithm based on Runge Kutta method
Han et al. A survey on metaheuristic optimization for random single-hidden layer feedforward neural network
CN112329348A (en) Intelligent decision-making method for military countermeasure game under incomplete information condition
CN108351986A (en) Learning system, learning device, learning method, learning program, training data generating means, training data generation method, training data generate program, terminal installation and threshold value change device
CN107909153A (en) The modelling decision search learning method of confrontation network is generated based on condition
CN106709482A (en) Method for identifying genetic relationship of figures based on self-encoder
CN108520166A (en) A kind of drug targets prediction technique based on multiple similitude network wandering
CN110222634A (en) A kind of human posture recognition method based on convolutional neural networks
Ju et al. Online data migration model and ID3 algorithm in sports competition action data mining application
CN111061959B (en) Group intelligent software task recommendation method based on developer characteristics
CN116757497B (en) Multi-mode military intelligent auxiliary combat decision-making method based on graph-like perception transducer
CN111639677B (en) Garbage image classification method based on multi-branch channel capacity expansion network
Li et al. EMFNet: Enhanced multisource fusion network for land cover classification
CN103093247A (en) Automatic classification method for plant images
CN108647772A (en) A method of it is rejected for slope monitoring data error
CN105608118B (en) Result method for pushing based on customer interaction information
CN110245292A (en) A kind of natural language Relation extraction method based on neural network filtering noise characteristic
CN107194468A (en) Towards the decision tree Increment Learning Algorithm of information big data
CN116977661A (en) Data processing method, device, equipment, storage medium and program product
Ali et al. Indoor scene recognition using ResNet-18
Jiang et al. ATSA: An Adaptive Tree Seed Algorithm based on double-layer framework with tree migration and seed intelligent generation
Joseph et al. GANDALF: Gated Adaptive Network for Deep Automated Learning of Features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201010

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20201010

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant